[hts-users:04242] Re: manipulating a trained voice

Subject: [hts-users:04242] Re: manipulating a trained voice

From: Alexis Moinet <alexis.moinet@umons.ac.be>

Date: Thu, 12 Mar 2015 12:05:46 +0100

Authentication-results: spf=pass (sender IP is 193.190.208.132) smtp.mailfrom=Alexis.MOINET@umons.ac.be; sp.nitech.ac.jp; dkim=none (message not signed) header.d=none;

Delivered-to: hts-users@sp.nitech.ac.jp

In-reply-to: <CAGQeFhaC69ci9-ExwHNeB3+TBw-Nn7OUuU2bGtfQ=HVr+rdi3Q@mail.gmail.com>

List-help: <mailto:hts-users-ctl@sp.nitech.ac.jp?body=help>

List-id: hts-users.sp.nitech.ac.jp

List-owner: <mailto:hts-users-admin@sp.nitech.ac.jp>

List-post: <mailto:hts-users@sp.nitech.ac.jp>

List-software: fml [fml 4.0 STABLE (20040215/4.0.4_BETA)]

List-unsubscribe: <mailto:hts-users-ctl@sp.nitech.ac.jp?body=unsubscribe>

References: <CAGQeFhaC69ci9-ExwHNeB3+TBw-Nn7OUuU2bGtfQ=HVr+rdi3Q@mail.gmail.com>

Reply-to: hts-users@sp.nitech.ac.jp

User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0

Hello

in hts_engine, you can edit the function HTS_Vocoder_synthesize() in lib/HTS_vocoder.c and replace the line:

p = v->rate / exp(lf0);

with

p = 110;

This should overwrite any pitch value generated from the model (for voiced frames) with 110Hz. However this means recompiling hts_engine every time you want to change the value. Note that in most recent versions of hts_engine, there are also the lines

p = v->rate / MIN_F0;
p = v->rate / MAX_F0;

to be replaced with p = 110;

To avoid recompiling every time, you could also modify the code of hts_engine.c so that the constant pitch value is passed as an argument of the executable (i.e. in the command line and it would end up in the "argv" of main()) and propagate it to the vocoder (adding a specific variable to the struct _HTS_Vocoder). This way you could do "hts_engine ... -c 110" (for instance, assuming that "-c" is the parameter name for your constant pitch value) and synthesize with constant pitch at 110Hz.

Hope it helps,

Alexis

ps: a bit of self-advertisment (sorry for that): in MAGE (our realtime library based on hts_engine 1.06 ), you can set the pitch value at runtime to whatever value (constant or variable) you want while the speech is produced. However I'm not sure how much it corresponds to your use case (for instance we use limited phonemic context, though you can configure it to use the complete one).

On 11/03/15 20:09, Erica Cooper wrote:

Hi all,

I was wondering whether it is possible to manipulate the parameters of a voice trained with HTS, and if so, how to go about doing this. I know that it is possible to manipulate the pitch contour or the duration of an utterance using Praat, which resynthesizes the utterance, but for doing something very simple and global, such as assigning a completely flat f0 contour to all output utterances, is there a way to do this by manipulating the trained models, or perhaps at the synthesis stage using hts_engine?

Thanks,
Erica