[hts-users:04241] Re: manipulating a trained voice

Subject: [hts-users:04241] Re: manipulating a trained voice

From: Alexis Moinet <alexis.moinet@umons.ac.be>

Date: Thu, 12 Mar 2015 12:02:32 +0100

Authentication-results: spf=pass (sender IP is 193.190.208.132) smtp.mailfrom=Alexis.MOINET@umons.ac.be; sp.nitech.ac.jp; dkim=none (message not signed) header.d=none;

Delivered-to: hts-users@sp.nitech.ac.jp

In-reply-to: <CAGQeFhaC69ci9-ExwHNeB3+TBw-Nn7OUuU2bGtfQ=HVr+rdi3Q@mail.gmail.com>

List-help: <mailto:hts-users-ctl@sp.nitech.ac.jp?body=help>

List-id: hts-users.sp.nitech.ac.jp

List-owner: <mailto:hts-users-admin@sp.nitech.ac.jp>

List-post: <mailto:hts-users@sp.nitech.ac.jp>

List-software: fml [fml 4.0 STABLE (20040215/4.0.4_BETA)]

List-unsubscribe: <mailto:hts-users-ctl@sp.nitech.ac.jp?body=unsubscribe>

References: <CAGQeFhaC69ci9-ExwHNeB3+TBw-Nn7OUuU2bGtfQ=HVr+rdi3Q@mail.gmail.com>

Reply-to: hts-users@sp.nitech.ac.jp

User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0

in hts_engine, you can edit the function HTS_Vocoder_synthesize in lib/HTS_vocoder.c and replace the line

p = v->rate / exp(lf0);

with

p = 110;

I think this should overwrite any value generated from the model (for voiced frames). However this means recompiling hts_engine every time you want to change the value.

To avoid this, you could also modify the code of hts_engine.c so that the constant pitch value is passed as an argument of the executable (i.e. in the command line and it would end up in the "argv" of main()) and propagate it to the vocoder (adding a specific variable to the struct _HTS_Vocoder). This way you could do "hts_engine ... -c 110" (for instance, assuming that "-c" is the parameter name for your constant pitch value) and synthesize with constant pitch at 110Hz.

Alexis

ps: a bit offtopic self-advertisment (sorry for that): in MAGE (our realtime library based on hts_engine 1.06 ), you can set the pitch value at runtime to whatever constant or variable value you want while the speech is produced. However I'm not sure how much it corresponds to your use case (for instance we use limited phonemic context, though you can configure it to use the complete one).

On 11/03/15 20:09, Erica Cooper wrote:

Hi all,

I was wondering whether it is possible to manipulate the parameters of a voice trained with HTS, and if so, how to go about doing this. I know that it is possible to manipulate the pitch contour or the duration of an utterance using Praat, which resynthesizes the utterance, but for doing something very simple and global, such as assigning a completely flat f0 contour to all output utterances, is there a way to do this by manipulating the trained models, or perhaps at the synthesis stage using hts_engine?

Thanks,
Erica