[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:02180] Re: problem with volume after adaptation


Two tips.

Before you convert float value to short value for raw waveforms
using x2x,

$MGLSADF -m ".($ordr{'mgc'}-1)." -p $fs -a $fw -g $gm $mgc | "
"$X2X +fs | "
"$SOX -c 1 -s -w -t raw -r $sr - -c 1 -s -w -t wav -r $sr $gendir/ $base.wav";

you may modify amplitude of them by using sopr if necessary.

$MGLSADF -m ".($ordr{'mgc'}-1)." -p $fs -a $fw -g $gm $mgc | "
"$SOPR -d 2 | "
"$X2X +fs | "
"$SOX -c 1 -s -w -t raw -r $sr - -c 1 -s -w -t wav -r $sr $gendir/ $base.wav";

Sometimes they exceed intmax 32766 due to Gaussian's nature.

Even if you forcibly modify amplitude of them and still face the issue of
power or amplitude, you should compare mean vectors for the GV models
(especially GV for C0 term) for the target speaker with those for
other speakers.

If the target speaker has higher a GV value for C0 term than others,
you would need to re-think about better normalization of amplitude between
speakers and within each speaker.

If the target speaker have almost the same GV values as others but
the issue does alter, your adaptation fails to transform C0 terms.
Please adjust tuning parameters for adaptation such as SPLITTHRESH
and SMAPSIGMA. You may use separate block transforms for them,
e.g. for 40 static features HADAPT:BLOCKSIZE = "IntVec 6 1 39 1 39 1 39"


On 18 Aug 2009, at 22:56, Tóth Bálint wrote:

Dear Junichi Yamagishi,

Thank you very much for your answer.

I’ve checked and HTS calculates GV models from all speakers (including average voice speakers and target speaker). When GV is calculated from all speakers there are much more overshoots, but in case of calculating GV only from the target speaker the synthesized voice (SAT+adaptation) still overshoots often.

I normalized all the audio data in the same way.

In case of SI+adaptation (with the same training and adaptation data) the overshoots are quite rarely in the synthesized speech, but there are still some.

Any help is highly appreciated.

Best Regards,
Balint Toth

Junichi Yamagishi írta:

Did you calculate GV models on adaptation data for the target speaker?
Sometimes GV models calculated for average voice are too big for some
speakers. (this would be crucial for log gain case.)

I don't remember how HTS-demo calculates this, but please check this first.

Then it might be good to normalize amplitude level of adaptation data to that of training data for avoiding bad transformation of C0/gain terms.

Junichi Yamagishi

On 12 Aug 2009, at 22:22, Tóth Bálint wrote:


I am trying to adapt HTS to a new voice. The SAT average voice is ok: http://alpha.tmit.bme.hu/~toth.b/hts_samples/SAT.wav

but after adaptation the volume overshoots: http://alpha.tmit.bme.hu/~toth.b/hts_samples/SAT_dec_feat3.wav

The volume of the adaptation data is normal, there are no overshoots. The adaptation of other voices works well.

Can you please help me, what can be the problem?

Thanks in advance!

Best Regards,
Balint Toth

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

[hts-users:02079] CMLLR, pollyvin
[hts-users:02169] problem with volume after adaptation, Tóth Bálint
[hts-users:02170] Re: problem with volume after adaptation, Junichi Yamagishi
[hts-users:02179] Re: problem with volume after adaptation, Tóth Bálint