[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03328] Re: About the prototype definition of the MSD-HMM


Hi,

Thanks. I will try to do this.

2012/6/6 那兴宇 <nxy-yzqs@xxxxxxx>:
> Hi,
> You can try assign F0 features in one stream.
> I have tried this and got a bit lower "RMSE of F0", and I remember there was
> a paper using this structure (maybe by Heiga?).
>
>
> --
> Xingyu Na (那兴宇)
> Beijing Institute of Technology
> naxy(at)bit.edu.cn
> asr.naxingyu(at)gmail.com
> naxingyu at {facebook, twitter, linkedin}
>
>
> At 2012-06-06 02:45:20,"Kwan Lisa" <lisakwan1102@xxxxxxxxx> wrote:
>>Hi,
>>
>>I don't understand why the distributions of frequency and its 1- and
>>2-order dynamics are placed in stream 2, 3, and 4 respectively, but
>>the distributions of spectral and its 1- and 2-order dynamics are
>>placed in stream 1. What if I place all of frequency feature in stream
>>2 and treat them as a 3 dimensional data like spectral data?
>>
>>2012/6/5 那兴宇 <nxy-yzqs@xxxxxxx>:
>>> Hi,
>>>
>>> Yes, you are right.
>>> Stream 3 and 4 are distributions of 1- and 2-order dynamics respectively.
>>>
>>> --
>>> Xingyu Na (那兴宇)
>>> Beijing Institute of Technology
>>> naxy(at)bit.edu.cn
>>> asr.naxingyu(at)gmail.com
>>> naxingyu at {facebook, twitter, linkedin}
>>>
>>>
>>> At 2012-06-05 02:43:21,"Kwan Lisa" <lisakwan1102@xxxxxxxxx> wrote:
>>>>Hi,
>>>>
>>>>I have question about the definition of the prototype of the MSD-HMM.
>>>>My monophone.mmf in the model directory is like:
>>>>~h "mo"
>>>><BEGINHMM>
>>>><NUMSTATES> 7
>>>><STATE> 2
>>>>
>>>><STREAM> 1
>>>><MEAN> 120
>>>>...
>>>><VARIANCE> 120
>>>>...
>>>><GCONST> -7.524412e+02
>>>>
>>>><STREAM> 2
>>>><NUMMIXES> 2
>>>><MIXTURE> 1 5.459577e-01
>>>><MEAN> 1
>>>> 4.907650e+00
>>>><VARIANCE> 1
>>>> 2.217153e-02
>>>><GCONST> -1.971069e+00
>>>><MIXTURE> 2 4.540337e-01
>>>><MEAN> 0
>>>><VARIANCE> 0
>>>><GCONST> 0.000000e+00
>>>>
>>>><STREAM> 3
>>>>...
>>>>
>>>><STREAM> 4
>>>>...
>>>>
>>>>I want to utilize the monophone model to calculate the KL-divergence
>>>>of the frequency distributions. However there are voice and unvoiced
>>>>speech. I think stream 2, 3, and 4 are the frequency streams, and in
>>>>each stream mixture 1 stands for the distribution weight of the voiced
>>>>speech and mixture 2 stands for the distribution weight of the
>>>>unvoiced speech. Is my explanation correct?
>>>>
>>>>--
>>>>Lisa Kwan
>>>>lisakwan1102(at)gmail.com
>>>>Advanced Speech Technology Lab, ASTL
>>>>
>>>
>>>
>>>
>>
>>
>>
>>--
>>Lisa Kwan
>>lisakwan1102(at)gmail.com
>>Advanced Speech Technology Lab, ASTL
>>



-- 
Lisa Kwan
lisakwan1102(at)gmail.com
Advanced Speech Technology Lab, ASTL

Follow-Ups
[hts-users:03346] Re: About the prototype definition of the MSD-HMM, Kwan Lisa
References
[hts-users:03321] About the prototype definition of the MSD-HMM, Kwan Lisa
[hts-users:03322] Re: About the prototype definition of the MSD-HMM, 那兴宇
[hts-users:03324] Re: About the prototype definition of the MSD-HMM, Kwan Lisa
[hts-users:03327] Re: About the prototype definition of the MSD-HMM, 那兴宇