[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03327] Re: About the prototype definition of the MSD-HMM


Hi,
You can try assign F0 features in one stream. 
I have tried this and got a bit lower "RMSE of F0", and I remember there was a paper using this structure (maybe by Heiga?).

--
Xingyu Na (那兴宇)
Beijing Institute of Technology
naxy(at)bit.edu.cn
asr.naxingyu(at)gmail.com
naxingyu at {facebook, twitter, linkedin}


At 2012-06-06 02:45:20,"Kwan Lisa" <lisakwan1102@xxxxxxxxx> wrote: >Hi, > >I don't understand why the distributions of frequency and its 1- and >2-order dynamics are placed in stream 2, 3, and 4 respectively, but >the distributions of spectral and its 1- and 2-order dynamics are >placed in stream 1. What if I place all of frequency feature in stream >2 and treat them as a 3 dimensional data like spectral data? > >2012/6/5 那兴宇 <nxy-yzqs@xxxxxxx>: >> Hi, >> >> Yes, you are right. >> Stream 3 and 4 are distributions of 1- and 2-order dynamics respectively. >> >> -- >> Xingyu Na (那兴宇) >> Beijing Institute of Technology >> naxy(at)bit.edu.cn >> asr.naxingyu(at)gmail.com >> naxingyu at {facebook, twitter, linkedin} >> >> >> At 2012-06-05 02:43:21,"Kwan Lisa" <lisakwan1102@xxxxxxxxx> wrote: >>>Hi, >>> >>>I have question about the definition of the prototype of the MSD-HMM. >>>My monophone.mmf in the model directory is like: >>>~h "mo" >>><BEGINHMM> >>><NUMSTATES> 7 >>><STATE> 2 >>> >>><STREAM> 1 >>><MEAN> 120 >>>... >>><VARIANCE> 120 >>>... >>><GCONST> -7.524412e+02 >>> >>><STREAM> 2 >>><NUMMIXES> 2 >>><MIXTURE> 1 5.459577e-01 >>><MEAN> 1 >>> 4.907650e+00 >>><VARIANCE> 1 >>> 2.217153e-02 >>><GCONST> -1.971069e+00 >>><MIXTURE> 2 4.540337e-01 >>><MEAN> 0 >>><VARIANCE> 0 >>><GCONST> 0.000000e+00 >>> >>><STREAM> 3 >>>... >>> >>><STREAM> 4 >>>... >>> >>>I want to utilize the monophone model to calculate the KL-divergence >>>of the frequency distributions. However there are voice and unvoiced >>>speech. I think stream 2, 3, and 4 are the frequency streams, and in >>>each stream mixture 1 stands for the distribution weight of the voiced >>>speech and mixture 2 stands for the distribution weight of the >>>unvoiced speech. Is my explanation correct? >>> >>>-- >>>Lisa Kwan >>>lisakwan1102(at)gmail.com >>>Advanced Speech Technology Lab, ASTL >>> >> >> >> > > > >--  >Lisa Kwan >lisakwan1102(at)gmail.com >Advanced Speech Technology Lab, ASTL >

Follow-Ups
[hts-users:03328] Re: About the prototype definition of the MSD-HMM, Kwan Lisa
References
[hts-users:03321] About the prototype definition of the MSD-HMM, Kwan Lisa
[hts-users:03322] Re: About the prototype definition of the MSD-HMM, 那兴宇
[hts-users:03324] Re: About the prototype definition of the MSD-HMM, Kwan Lisa