*Subject*: [hts-users:03240] Re: duration modeling and dur file*From*: Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>*Date*: Mon, 16 Apr 2012 15:53:38 +0900*Cc*: uratec <uratec@xxxxxxxxxxxx>*Delivered-to*: hts-users@xxxxxxxxxxxxxxx*Dkim-signature*: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=BUwqCkDfhAhcS+5adx9EkZ8icukc1rnsi6YMy5dHucI=; b=G/wbYQtSYubqzeG1r9vGckcyvAi/kWa3Zv7ICgSQHoZiQEdY1Yt2zUjioqPeKHEmf8 Sfya6Yll2KH0+rwAJzrKeck+WxBj+aXKu6zHciyAb580OSujoS261GxRBM61nNxD6YRv NIaA/6hGblMTRSKuj/JfnCm3bN+D2W3P7A5cXpTQ1vkN2WMI4d8CyD0FfRMj3u3BW+6m olsmnZT3H4kBekJF/4oAlwNg63V0C2k6qoGjtSCMyt/rJD9gNQxlL0MttY9PBXCRWrMd cD7EK+aK5Dr810RkdA1JeWRrd+yH6HFJHDNavhDPZMEZwv40u5R0H1N+4jVD9zc71r4k 9IAA==

Do you apply speaker adaptation for the duration models?

I expect that the transform matrices of speaker adaptation make the difference.

Regards,

Keiichiro Oura

2012/4/16 Heamin Lee <oasistony@xxxxxxxxxxx>

Hello,

I am using HTS-2.2 for Speaker adaptation.

In HMGenS after adaptation, the output files are mgc, lf0, bap, dur. And I found that mgc, lf0, bap are generated through dur file.

In the paper, Hidden Semi-Markov Model Based Speech Synthesis System, I’ve understood that state durations are, after all, determined by state duration mean vector.

So I expected that the mean of *.dur file is same as a mean vector of duration model.

However, the mean in the *.dur file is little different from a mean of duration model, for example, 1.313743e+00 is mean of * .dur file, 1.306419e+00 is mean of duration model.

Why it is different? Is there another algorithm to generate duration mean vector or something wrong with this process?

And more important problem is that sometimes the duration is set to “1” for all states without considering the mean.

Below is the problem, and this problem does not appear in the speech without adaptation. Why does this problem occurs?

*.dur

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[2]: duration=1 (frame), mean=1.313743e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[3]: duration=1 (frame), mean=2.475758e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[4]: duration=1 (frame), mean=3.064046e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[5]: duration=1 (frame), mean=3.948951e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33.state[6]: duration=1 (frame), mean=3.052463e+00

AX^P-WW+L=R@2_2/A:2/B:3-2-&14-20|WW/C:2/D:2/E:3+&7/F:4/H:12=33: duration=5 (frame), mean=1.385496e+01

Duration model

~s "dur_s2_1623"

<STREAM> 1

<MEAN> 1

1.306419e+00

<VARIANCE> 1

1.695042e-01

<GCONST> 6.299961e-02

<STREAM> 2

<MEAN> 1

2.176778e+00

<VARIANCE> 1

1.851734e+00

<GCONST> 2.454000e+00

<STREAM> 3

<MEAN> 1

3.774895e+00

<VARIANCE> 1

1.702917e+01

<GCONST> 4.672805e+00

<STREAM> 4

<MEAN> 1

5.540838e+00

<VARIANCE> 1

3.468322e+01

<GCONST> 5.384133e+00

<STREAM> 5

<MEAN> 1

2.440767e+00

<VARIANCE> 1

8.953208e+00

<GCONST> 4.029889e+00

**Follow-Ups****[hts-users:03241] Re: duration modeling and dur file**,*Heamin Lee*

**References****[hts-users:03239] duration modeling and dur file**,*Heamin Lee*

- Prev by Subject:
**[hts-users:03239] duration modeling and dur file** - Next by Subject:
**[hts-users:03241] Re: duration modeling and dur file** - Previous by thread:
**[hts-users:03239] duration modeling and dur file** - Next by thread:
**[hts-users:03241] Re: duration modeling and dur file**