[hts-users:03247] Re: duration modeling and dur file
- Subject: [hts-users:03247] Re: duration modeling and dur file
- From: Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
- Date: Tue, 17 Apr 2012 12:23:28 +0900
- Cc: uratec <uratec@xxxxxxxxxxxx>
- Delivered-to: hts-users@xxxxxxxxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=Gx8exT86gZfFbmTK1MS/5RCovBfrQV6sNm/MfozEI/I=; b=Rj4ULOgnuH5bUyGG934NIFmQVvMoshBkX8zadkAr4Rej2OP79gZ0TE/F6huVcZBj/k GPXmSJtp9W2jFE/r2g3AlBaWUEXbcBg31DAAOYwfungcwCWAfHsfRDygte6nTPWh3T1l NFBoQH1iVlCzxMmYxarQu+DdhQ4AZgVcWC59JzrUiIZXF4pRvSkawt5zkuI7qbAE4mep repXTb25+8RqqklQ6CYaYYtGzoq1VZbWQ7Iwx7An49v3tpb47HxtgTd+SDNe/MwRr2bZ Boaf02ZBZXUy7/y14gRSrusKd2A962RUXTGNTBRn0kGpe9tQcbIRbMkneX/9OzYqnX21 MhUA==
Hi,
That is strange.
Please try to debug SetStateDurations() in HTKLib/HGen.c
You can see the reason that genInfo->durations[i][j] is always 1.
Regards,
Keiichiro Oura
2012/4/17 Heamin Lee <oasistony@xxxxxxxxxxx>:
> Hello,
>
> I changed USEALIGN to FALSE, but problem is still there.
>
> HMGenS -A -B -C configs/syn.cnf -D -T 1 -S data/scp/gen.scp -t 1500 100
> 5000 -h */*_%%%_* -c 0 -H models/qst001/ver1/cmp/re_clustered_sat_all.mmf -N
> models/qst001/ver1/dur/re_clustered_sat_all.mmf -M
> gen/qst001/ver1/SAT+dec_feat3/0 -a -J models/qst001/ver1/cmp/xforms
> SAT+dec_feat3 -H models/qst001/ver1/cmp/regTrees/dec.base -H
> models/qst001/ver1/cmp/regTrees/dec.tree -b -Y models/qst001/ver1/dur/xforms
> SAT+dec_feat3 -N models/qst001/ver1/dur/regTrees/dec.base -N
> models/qst001/ver1/dur/regTrees/dec.tree models/qst001/ver1/cmp/tiedlist
> models/qst001/ver1/dur/tiedlist
>
> HTK Configuration Parameters[24]
> Module/Tool Parameter Value
> # CDGV TRUE
> # GVOFFMODEL StrVec 2 sil sp
> # OPTKIND NEWTON
> # GVWEIGHT 1
> # HMMWEIGHT 1
> # STEPDEC 0.500000
> # STEPINC 1.200000
> # STEPINIT 1
> # MINEUCNORM 0.010000
> # GVEPSILON 0.000100
> # MAXGVITER 50
> # GVHMMLIST gv/qst001/ver1/tiedlist
> # GVMODELMMF gv/qst001/ver1/clustered_all.mmf
> # USEGV TRUE
> # EMEPSILON 0.000100
> # MAXEMITER 20
> # WINDIR data/win
> # WINFN StrVec 3 mgc.win1 mgc.win2 mgc.win3 StrVec
> 3 lf0.win1 lf0.win2 lf0.win3 StrVec 3 bap.win1 bap.win2 bap.win3
> # PDFSTREXT StrVec 3 mgc lf0 bap
> # PDFSTRORDER IntVec 3 40 1 5
> # PDFSTRSIZE IntVec 3 1 3 1
> # USEALIGN FALSE
> # NATURALWRITEORDER TRUE
> # NATURALREADORDER TRUE
>
> -----Original Message-----
> From: ura228@xxxxxxxxx [mailto:ura228@xxxxxxxxx] On Behalf Of Keiichiro Oura
> Sent: Tuesday, April 17, 2012 11:44 AM
> To: hts-users@xxxxxxxxxxxxxxx
> Cc: uratec
> Subject: [hts-users:03245] Re: duration modeling and dur file
>
> Hi,
>
> Could you try USEALIGN=FALSE ?
>
> Regards,
> Keiichiro Oura
>
>
> 2012/4/17 Heamin Lee <oasistony@xxxxxxxxxxx>
>>
>> Hi,
>>
>> HMGenS script is below…
>>
>>
>>
>> HMGenS -A -B -C configs/syn.cnf -D -T 1 -S data/scp/gen.scp -t 1500
>> 100
>> 5000 -h */*_%%%_* -c 0 -H
>> models/qst001/ver1/cmp/re_clustered_sat_all.mmf -N
>> models/qst001/ver1/dur/re_clustered_sat_all.mmf -M
>> gen/qst001/ver1/SAT+dec_feat3/0 -a -J models/qst001/ver1/cmp/xforms
>> SAT+dec_feat3 -H models/qst001/ver1/cmp/regTrees/dec.base -H
>> models/qst001/ver1/cmp/regTrees/dec.tree -b -Y
>> models/qst001/ver1/dur/xforms
>> SAT+dec_feat3 -N models/qst001/ver1/dur/regTrees/dec.base -N
>> models/qst001/ver1/dur/regTrees/dec.tree
>> models/qst001/ver1/cmp/tiedlist models/qst001/ver1/dur/tiedlist
>>
>>
>>
>> HTK Configuration Parameters[24]
>>
>> Module/Tool Parameter Value
>>
>> # CDGV TRUE
>>
>> # GVOFFMODEL StrVec 2 sil sp
>>
>> # OPTKIND NEWTON
>>
>> # GVWEIGHT 1
>>
>> # HMMWEIGHT 1
>>
>> # STEPDEC 0.500000
>>
>> # STEPINC 1.200000
>>
>> # STEPINIT 1
>>
>> # MINEUCNORM 0.010000
>>
>> # GVEPSILON 0.000100
>>
>> # MAXGVITER 50
>>
>> # GVHMMLIST gv/qst001/ver1/tiedlist
>>
>> # GVMODELMMF gv/qst001/ver1/clustered_all.mmf
>>
>> # USEGV TRUE
>>
>> # EMEPSILON 0.000100
>>
>> # MAXEMITER 20
>>
>> # WINDIR data/win
>>
>> # WINFN StrVec 3 mgc.win1 mgc.win2 mgc.win3
>> StrVec 3 lf0.win1 lf0.win2 lf0.win3 StrVec 3 bap.win1 bap.win2
>> bap.win3
>>
>> # PDFSTREXT StrVec 3 mgc lf0 bap
>>
>> # PDFSTRORDER IntVec 3 40 1 5
>>
>> # PDFSTRSIZE IntVec 3 1 3 1
>>
>> # USEALIGN TRUE
>>
>> # NATURALWRITEORDER TRUE
>>
>> # NATURALREADORDER TRUE
>>
>>
>>
>> From: ura228@xxxxxxxxx [mailto:ura228@xxxxxxxxx] On Behalf Of
>> Keiichiro Oura
>> Sent: Tuesday, April 17, 2012 1:11 AM
>> To: hts-users@xxxxxxxxxxxxxxx
>> Cc: uratec
>> Subject: [hts-users:03243] Re: duration modeling and dur file
>>
>>
>>
>> Hi,
>>
>>
>>
>> Please, tell me your HMGenS options and config files.
>>
>>
>>
>> Regards,
>>
>> Keiichiro Oura
>>
>>
>>
>> 2012/4/16 Heamin Lee <oasistony@xxxxxxxxxxx>
>>
>> Thanks for your answer.
>>
>> I applied speaker adaptation for the duration models too.
>>
>> Now I can understand that transform matrices make the difference.
>>
>> However, it is more serious problem that state durations are set to “1”
>> for all states without considering mean…
>>
>> Do you have any solution or idea for this problem?
>>
>>
>>
>> *.dur
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[2]:
>> duration=1 (frame), mean=1.313743e+00
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[3]:
>> duration=1 (frame), mean=2.475758e+00
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[4]:
>> duration=1 (frame), mean=3.064046e+00
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[5]:
>> duration=1 (frame), mean=3.948951e+00
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[6]:
>> duration=1 (frame), mean=3.052463e+00
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33:
>> duration=5 (frame), mean=1.385496e+01
>>
>>
>>
>> From: ura228@xxxxxxxxx [mailto:ura228@xxxxxxxxx] On Behalf Of
>> Keiichiro Oura
>> Sent: Monday, April 16, 2012 3:54 PM
>> To: hts-users@xxxxxxxxxxxxxxx
>> Cc: uratec
>> Subject: [hts-users:03240] Re: duration modeling and dur file
>>
>>
>>
>> Hi,
>>
>>
>>
>> Do you apply speaker adaptation for the duration models?
>>
>> I expect that the transform matrices of speaker adaptation make the
>> difference.
>>
>>
>>
>> Regards,
>>
>> Keiichiro Oura
>>
>>
>>
>>
>>
>> 2012/4/16 Heamin Lee <oasistony@xxxxxxxxxxx>
>>
>> Hello,
>>
>>
>>
>> I am using HTS-2.2 for Speaker adaptation.
>>
>> In HMGenS after adaptation, the output files are mgc, lf0, bap, dur.
>> And I found that mgc, lf0, bap are generated through dur file.
>>
>> In the paper, Hidden Semi-Markov Model Based Speech Synthesis System,
>> I’ve understood that state durations are, after all, determined by
>> state duration mean vector.
>>
>> So I expected that the mean of *.dur file is same as a mean vector of
>> duration model.
>>
>> However, the mean in the *.dur file is little different from a mean of
>> duration model, for example, 1.313743e+00 is mean of * .dur file,
>> 1.306419e+00 is mean of duration model.
>>
>> Why it is different? Is there another algorithm to generate duration
>> mean vector or something wrong with this process?
>>
>>
>>
>> And more important problem is that sometimes the duration is set to “1”
>> for all states without considering the mean.
>>
>> Below is the problem, and this problem does not appear in the speech
>> without adaptation. Why does this problem occurs?
>>
>>
>>
>> *.dur
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[2]:
>> duration=1 (frame), mean=1.313743e+00
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[3]:
>> duration=1 (frame), mean=2.475758e+00
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[4]:
>> duration=1 (frame), mean=3.064046e+00
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[5]:
>> duration=1 (frame), mean=3.948951e+00
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[6]:
>> duration=1 (frame), mean=3.052463e+00
>>
>> AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33:
>> duration=5 (frame), mean=1.385496e+01
>>
>>
>>
>> Duration model
>>
>> ~s "dur_s2_1623"
>>
>> <STREAM> 1
>>
>> <MEAN> 1
>>
>> 1.306419e+00
>>
>> <VARIANCE> 1
>>
>> 1.695042e-01
>>
>> <GCONST> 6.299961e-02
>>
>> <STREAM> 2
>>
>> <MEAN> 1
>>
>> 2.176778e+00
>>
>> <VARIANCE> 1
>>
>> 1.851734e+00
>>
>> <GCONST> 2.454000e+00
>>
>> <STREAM> 3
>>
>> <MEAN> 1
>>
>> 3.774895e+00
>>
>> <VARIANCE> 1
>>
>> 1.702917e+01
>>
>> <GCONST> 4.672805e+00
>>
>> <STREAM> 4
>>
>> <MEAN> 1
>>
>> 5.540838e+00
>>
>> <VARIANCE> 1
>>
>> 3.468322e+01
>>
>> <GCONST> 5.384133e+00
>>
>> <STREAM> 5
>>
>> <MEAN> 1
>>
>> 2.440767e+00
>>
>> <VARIANCE> 1
>>
>> 8.953208e+00
>>
>> <GCONST> 4.029889e+00
>>
>>
>>
>>
>
>
>
- Follow-Ups
-
- [hts-users:03250] Re: duration modeling and dur file, Heamin Lee
- References
-
- [hts-users:03239] duration modeling and dur file, Heamin Lee
- [hts-users:03240] Re: duration modeling and dur file, Keiichiro Oura
- [hts-users:03241] Re: duration modeling and dur file, Heamin Lee
- [hts-users:03243] Re: duration modeling and dur file, Keiichiro Oura
- [hts-users:03244] Re: duration modeling and dur file, Heamin Lee
- [hts-users:03245] Re: duration modeling and dur file, Keiichiro Oura
- [hts-users:03246] Re: duration modeling and dur file, Heamin Lee