[hts-users:03322] Re: About the prototype definition of the MSD-HMM

Subject: [hts-users:03322] Re: About the prototype definition of the MSD-HMM

Date: Tue, 5 Jun 2012 09:40:47 +0800 (CST)

Delivered-to: hts-users@xxxxxxxxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Received:Date:From:To:Subject:In-Reply-To: References:Content-Type:MIME-Version:Message-ID; bh=se0U8+g84gO3 7a8btXgJjUDz6CVAz5il9XOH69oRzq8=; b=I9Hg8s2ceYoFLwAEfa8PtGRd1StH fB7VoQ18ONSxmaU6rE1Fp3nXShMC7pZgw8ZgNkPJUqvKYWVuA6sVxbA4gZ1HCfMk dEHczr7V8jTTfOfzFFOYdH/kBorahgxFw4KMsiCwK+OUgc1Ferw+txTDPIZuG0+w DeqULmVq0PfrT1c=

Hi,

Yes, you are right.

Stream 3 and 4 are distributions of 1- and 2-order dynamics respectively.

Xingyu Na (那兴宇)

Beijing Institute of Technology

naxy(at)bit.edu.cn

asr.naxingyu(at)gmail.com

naxingyu at {facebook, twitter, linkedin}


At 2012-06-05 02:43:21,"Kwan Lisa" <lisakwan1102@xxxxxxxxx> wrote:
>Hi,
>
>I have question about the definition of the prototype of the MSD-HMM.
>My monophone.mmf in the model directory is like:
>~h "mo"
><BEGINHMM>
><NUMSTATES> 7
><STATE> 2
>
><STREAM> 1
><MEAN> 120
>...
><VARIANCE> 120
>...
><GCONST> -7.524412e+02
>
><STREAM> 2
><NUMMIXES> 2
><MIXTURE> 1 5.459577e-01
><MEAN> 1
> 4.907650e+00
><VARIANCE> 1
> 2.217153e-02
><GCONST> -1.971069e+00
><MIXTURE> 2 4.540337e-01
><MEAN> 0
><VARIANCE> 0
><GCONST> 0.000000e+00
>
><STREAM> 3
>...
>
><STREAM> 4
>...
>
>I want to utilize the monophone model to calculate the KL-divergence
>of the frequency distributions. However there are voice and unvoiced
>speech. I think stream 2, 3, and 4 are the frequency streams, and in
>each stream mixture 1 stands for the distribution weight of the voiced
>speech and mixture 2 stands for the distribution weight of the
>unvoiced speech. Is my explanation correct?
>
>-- 
>Lisa Kwan
>lisakwan1102(at)gmail.com
>Advanced Speech Technology Lab, ASTL
>