[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04284] Re: HMGenS output pdf sequences' format


Dear Najeeb,

 

Matt is correct at this point.

 

In my understanding, part reason for such “natural” parameter representation instead of the normal N(mu, sigma^2) representation is: this will facilitate the implementation detail when solving Eq. (15) in [1]. To be exact, when solving such equation we need the inverse of variance instead of variance, and need mean divided by variance instead of mean.

 

To convert back to mean and variance, you can simply modify the code in function WriteParms() in HMGenS.c source file based on the aforementioned mathematical relation. The following lines of code should be modified:

   /* output pdfs */

   if (outPdf) {

      WriteVector(pdffp, pst->mseq[pst->t], inBinary);

      if (pst->fullCov)

         WriteTriMat(pdffp, pst->vseq[pst->t].inv, inBinary);

      else

         WriteVector(pdffp, pst->vseq[pst->t].var, inBinary);

   }

 

I have done this before.

 

Yang Wang

 

[1] K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in ICASSP, 2000, pp. 1315-1318.



2015-06-29 10:52 GMT+08:00 Matt Shannon <sms46@cam.ac.uk>:
Hi Najeeb,

I believe the values stored in HMGenS-generated pdf files do represent
a Gaussian distribution for each frame and window, but in the
"natural" (b-value, precision) parameterization rather than the (mean,
variance) parameterization.  The b-value is mean-times-precision and
the precision is inverse variance.  You should be able to easily check
this in the HMGenS.c source code.

Matt


On Sun, 28 Jun 2015 05:42:41 +0000 (UTC)
Najeeb Ullah Khan <najeeeb@ymail.com> wrote:

> I am trying to output the pdf sequences for an utterance using the -p
> flag of HMGenS tool. However, when I checked the .lf0_pdf file I
> found the following values. 2.108878e+02 0.000000e+00
> 0.000000e+00 4.526616e+01 0.000000e+00 0.000000e+00 2.108878e+02
> -5.418909e+01 5.599444e-01 4.526616e+01 2.456803e+03
> 4.348116e+02 3.033320e+02 -4.598551e+01 2.942826e+00 6.523090e+01
> 4.490591e+03 1.066298e+03 3.033320e+02 -4.598551e+01
> 2.942826e+00 6.523090e+01 4.490591e+03 1.066298e+03 3.033320e+02
> -4.598551e+01 2.942826e+00 6.523090e+01 4.490591e+03
> 1.066298e+03 4.374781e+02 6.018007e+01 1.525493e+00 9.478203e+01
> 7.552817e+03 1.075918e+03 4.374781e+02 6.018007e+01
> 1.525493e+00 9.478203e+01 7.552817e+03 1.075918e+03 4.374781e+02
> 0.000000e+00 0.000000e+00 9.478203e+01 0.000000e+00
> 0.000000e+00 -1.000000e+10 -1.000000e+10 -1.000000e+10 -1.000000e+10
> -1.000000e+10 -1.000000e+10 -1.000000e+10 -1.000000e+10
> -1.000000e+10 -1.000000e+10 -1.000000e+10 -1.000000e+10 -1.000000e+10
> -1.000000e+10 -1.000000e+10 -1.000000e+10 -1.000000e+10
> -1.000000e+10 -1.000000e+10 -1.000000e+10 -1.000000e+10 -1.000000e+10
> -1.000000e+10 -1.000000e+10 I think the first line is the mean of
> static and dynamic features and the second line is the variance? but
> the values of the mean are too high, do they represent F0, logF0 or
> some other scaling is used? What do they correspond to in terms of
> fundamental frequency in Hz? The sampling rate is 48000.
>
> Regards,Najeeb




Follow-Ups
[hts-users:04290] Re: HMGenS output pdf sequences' format, Najeeb Ullah Khan