[hts-users:03321] About the prototype definition of the MSD-HMM
- Subject: [hts-users:03321] About the prototype definition of the MSD-HMM
- From: Kwan Lisa <lisakwan1102@xxxxxxxxx>
- Date: Tue, 5 Jun 2012 02:43:21 +0800
- Delivered-to: hts-users@xxxxxxxxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=GELwW4Ahp9MYzX4Ltb6DE3bxpx7SfBBCy1u26Ce2csk=; b=BgINmA5HOMG4LdXb6bPpEsO9HnWio52wAAUls1vYemh6S0DsKtqnvxt4SCyhg/RAcc Cclc974HaMa4P6kohYus6xBq5oTqoQPDzFYivs+4FtbvN4iWhvG0Qfff3PoTo7Nyubon MhzTqE/rwNEN5ZvAeZh3c/x2Dcm0VoGvonPn5TREe8YwY20SiD1Af8HVigjsC9r7RCVS byE2g2GHeI0owlPIopg/DWMGyObWzpfqpHO62qrxvTTQ3Otmt9ZnXPkWVsrd9D60ySMY R5pGRZ98c6tUvbsm0itvoKquU7fc0/seVzof+oRererN6SKFTvjCXgIw+ja1EaLZXTLN XDjQ==
Hi,
I have question about the definition of the prototype of the MSD-HMM.
My monophone.mmf in the model directory is like:
~h "mo"
<BEGINHMM>
<NUMSTATES> 7
<STATE> 2
<STREAM> 1
<MEAN> 120
...
<VARIANCE> 120
...
<GCONST> -7.524412e+02
<STREAM> 2
<NUMMIXES> 2
<MIXTURE> 1 5.459577e-01
<MEAN> 1
4.907650e+00
<VARIANCE> 1
2.217153e-02
<GCONST> -1.971069e+00
<MIXTURE> 2 4.540337e-01
<MEAN> 0
<VARIANCE> 0
<GCONST> 0.000000e+00
<STREAM> 3
...
<STREAM> 4
...
I want to utilize the monophone model to calculate the KL-divergence
of the frequency distributions. However there are voice and unvoiced
speech. I think stream 2, 3, and 4 are the frequency streams, and in
each stream mixture 1 stands for the distribution weight of the voiced
speech and mixture 2 stands for the distribution weight of the
unvoiced speech. Is my explanation correct?
--
Lisa Kwan
lisakwan1102(at)gmail.com
Advanced Speech Technology Lab, ASTL
- Follow-Ups
-
- [hts-users:03322] Re: About the prototype definition of the MSD-HMM, 那兴宇