[hts-users:03244] Re: duration modeling and dur file

Hi,

HMGenS script is below…

HMGenS -A -B -C configs/syn.cnf -D -T 1 -S data/scp/gen.scp -t 1500 100 5000 -h */*_%%%_* -c 0 -H models/qst001/ver1/cmp/re_clustered_sat_all.mmf -N models/qst001/ver1/dur/re_clustered_sat_all.mmf -M gen/qst001/ver1/SAT+dec_feat3/0 -a -J models/qst001/ver1/cmp/xforms SAT+dec_feat3 -H models/qst001/ver1/cmp/regTrees/dec.base -H models/qst001/ver1/cmp/regTrees/dec.tree -b -Y models/qst001/ver1/dur/xforms SAT+dec_feat3 -N models/qst001/ver1/dur/regTrees/dec.base -N models/qst001/ver1/dur/regTrees/dec.tree models/qst001/ver1/cmp/tiedlist models/qst001/ver1/dur/tiedlist

HTK Configuration Parameters[24]

Module/Tool Parameter Value

# CDGV TRUE

# GVOFFMODEL StrVec 2 sil sp

# OPTKIND NEWTON

# GVWEIGHT 1

# HMMWEIGHT 1

# STEPDEC 0.500000

# STEPINC 1.200000

# STEPINIT 1

# MINEUCNORM 0.010000

# GVEPSILON 0.000100

# MAXGVITER 50

# GVHMMLIST gv/qst001/ver1/tiedlist

# GVMODELMMF gv/qst001/ver1/clustered_all.mmf

# USEGV TRUE

# EMEPSILON 0.000100

# MAXEMITER 20

# WINDIR data/win

# WINFN StrVec 3 mgc.win1 mgc.win2 mgc.win3 StrVec 3 lf0.win1 lf0.win2 lf0.win3 StrVec 3 bap.win1 bap.win2 bap.win3

# PDFSTREXT StrVec 3 mgc lf0 bap

# PDFSTRORDER IntVec 3 40 1 5

# PDFSTRSIZE IntVec 3 1 3 1

# USEALIGN TRUE

# NATURALWRITEORDER TRUE

# NATURALREADORDER TRUE

From: ura228@xxxxxxxxx [mailto:ura228@xxxxxxxxx] On Behalf Of Keiichiro Oura
Sent: Tuesday, April 17, 2012 1:11 AM
To: hts-users@xxxxxxxxxxxxxxx
Cc: uratec
Subject: [hts-users:03243] Re: duration modeling and dur file

Hi,

Please, tell me your HMGenS options and config files.

Regards,

Keiichiro Oura

2012/4/16 Heamin Lee <oasistony@xxxxxxxxxxx>

Thanks for your answer.

I applied speaker adaptation for the duration models too.

Now I can understand that transform matrices make the difference.

However, it is more serious problem that state durations are set to “1” for all states without considering mean…

Do you have any solution or idea for this problem?

*.dur

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[2]: duration=1 (frame), mean=1.313743e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[3]: duration=1 (frame), mean=2.475758e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[4]: duration=1 (frame), mean=3.064046e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[5]: duration=1 (frame), mean=3.948951e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[6]: duration=1 (frame), mean=3.052463e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33: duration=5 (frame), mean=1.385496e+01

From: ura228@xxxxxxxxx [mailto:ura228@xxxxxxxxx] On Behalf Of Keiichiro Oura
Sent: Monday, April 16, 2012 3:54 PM
To: hts-users@xxxxxxxxxxxxxxx
Cc: uratec
Subject: [hts-users:03240] Re: duration modeling and dur file

Hi,

Do you apply speaker adaptation for the duration models?

I expect that the transform matrices of speaker adaptation make the difference.

Regards,

Keiichiro Oura

2012/4/16 Heamin Lee <oasistony@xxxxxxxxxxx>

Hello,

I am using HTS-2.2 for Speaker adaptation.

In HMGenS after adaptation, the output files are mgc, lf0, bap, dur. And I found that mgc, lf0, bap are generated through dur file.

In the paper, Hidden Semi-Markov Model Based Speech Synthesis System, I’ve understood that state durations are, after all, determined by state duration mean vector.

So I expected that the mean of *.dur file is same as a mean vector of duration model.

However, the mean in the *.dur file is little different from a mean of duration model, for example, 1.313743e+00 is mean of * .dur file, 1.306419e+00 is mean of duration model.

Why it is different? Is there another algorithm to generate duration mean vector or something wrong with this process?

And more important problem is that sometimes the duration is set to “1” for all states without considering the mean.

Below is the problem, and this problem does not appear in the speech without adaptation. Why does this problem occurs?

*.dur

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[2]: duration=1 (frame), mean=1.313743e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[3]: duration=1 (frame), mean=2.475758e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[4]: duration=1 (frame), mean=3.064046e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[5]: duration=1 (frame), mean=3.948951e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[6]: duration=1 (frame), mean=3.052463e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33: duration=5 (frame), mean=1.385496e+01

Duration model

~s "dur_s2_1623"

<STREAM> 1

<MEAN> 1

1.306419e+00

<VARIANCE> 1

1.695042e-01

<GCONST> 6.299961e-02

<STREAM> 2

<MEAN> 1

2.176778e+00

<VARIANCE> 1

1.851734e+00

<GCONST> 2.454000e+00

<STREAM> 3

<MEAN> 1

3.774895e+00

<VARIANCE> 1

1.702917e+01

<GCONST> 4.672805e+00

<STREAM> 4

<MEAN> 1

5.540838e+00

<VARIANCE> 1

3.468322e+01

<GCONST> 5.384133e+00

<STREAM> 5

<MEAN> 1

2.440767e+00

<VARIANCE> 1

8.953208e+00

<GCONST> 4.029889e+00