[hts-users:03325] Re: About the prototype definition of the MSD-HMM
- Subject: [hts-users:03325] Re: About the prototype definition of the MSD-HMM
- From: 王洋 <yangwang84@xxxxxxxxx>
- Date: Wed, 6 Jun 2012 09:01:08 +0800
- Delivered-to: hts-users@xxxxxxxxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=X951AGo/x4gvaWVh5/OWGeXiS/2fBoq1gER5XF23pgE=; b=wXCHQD2pFJ8jAA5VjQq7Lj7eDKzAZpD6g8j4Pq7uaMwdTXuNTBfbzi4X4qWVo2C9dj KbfJvLfdGfoKJuN3bbPLiIOH3R6up+nzpcfbXHpmNZakaDqozlY0x8zlHyJrih4ZBJYA SjYPDvDSVICbHiZWHQ2Ohxk7wYbgbbrRHH9Xe7+tXFLjq2xYIqBRFDzgjAvmCX6nuUXa tjDlgOw3vr4WVigMuclFGrdhoEdmBHoX2R9XQtPazKbtxGrrHpw7JcWhJ75r5+IbvU9w +vBGYynW9N16jhJr3fkLMgQ68nCuMdviu143B6QkH4Dt/6Iwugonja601mTzf0pYtMWz fsmg==
Hi, Kwan Lisa,
When we talk about frequency feature, we notice that frequency
feature is composed of 1 dimensional continuous value corresponding to
voiced region and 0 dimensional symbol corresponding to unvoiced
region, so frequency feature is modeled by multi-space probability
distribution (MSD), not by 1 dimensional distribution.
You may further ask why not putting MSD stream 2-4 into a single
stream. I think it is impossible, because there are 2 mixtures in
stream 2(also in stream 3 and 4), the 2 mixture weights corresponding
to the 1 dimensional space and 0 dimensional space are summed to 1, so
when you put MSD stream 2-4 into a single stream, it is a problem how
to assign mixture weight mathematically correct. A direct solution is
to split stream 2-4 into 3 streams, which is as you see in HTS-Demo
implementation.
YangWang
2012/6/6 Kwan Lisa <lisakwan1102@xxxxxxxxx>:
> Hi,
>
> I don't understand why the distributions of frequency and its 1- and
> 2-order dynamics are placed in stream 2, 3, and 4 respectively, but
> the distributions of spectral and its 1- and 2-order dynamics are
> placed in stream 1. What if I place all of frequency feature in stream
> 2 and treat them as a 3 dimensional data like spectral data?
>
> 2012/6/5 那兴宇 <nxy-yzqs@xxxxxxx>:
>> Hi,
>>
>> Yes, you are right.
>> Stream 3 and 4 are distributions of 1- and 2-order dynamics respectively.
>>
>> --
>> Xingyu Na (那兴宇)
>> Beijing Institute of Technology
>> naxy(at)bit.edu.cn
>> asr.naxingyu(at)gmail.com
>> naxingyu at {facebook, twitter, linkedin}
>>
>>
>> At 2012-06-05 02:43:21,"Kwan Lisa" <lisakwan1102@xxxxxxxxx> wrote:
>>>Hi,
>>>
>>>I have question about the definition of the prototype of the MSD-HMM.
>>>My monophone.mmf in the model directory is like:
>>>~h "mo"
>>><BEGINHMM>
>>><NUMSTATES> 7
>>><STATE> 2
>>>
>>><STREAM> 1
>>><MEAN> 120
>>>...
>>><VARIANCE> 120
>>>...
>>><GCONST> -7.524412e+02
>>>
>>><STREAM> 2
>>><NUMMIXES> 2
>>><MIXTURE> 1 5.459577e-01
>>><MEAN> 1
>>> 4.907650e+00
>>><VARIANCE> 1
>>> 2.217153e-02
>>><GCONST> -1.971069e+00
>>><MIXTURE> 2 4.540337e-01
>>><MEAN> 0
>>><VARIANCE> 0
>>><GCONST> 0.000000e+00
>>>
>>><STREAM> 3
>>>...
>>>
>>><STREAM> 4
>>>...
>>>
>>>I want to utilize the monophone model to calculate the KL-divergence
>>>of the frequency distributions. However there are voice and unvoiced
>>>speech. I think stream 2, 3, and 4 are the frequency streams, and in
>>>each stream mixture 1 stands for the distribution weight of the voiced
>>>speech and mixture 2 stands for the distribution weight of the
>>>unvoiced speech. Is my explanation correct?
>>>
>>>--
>>>Lisa Kwan
>>>lisakwan1102(at)gmail.com
>>>Advanced Speech Technology Lab, ASTL
>>>
>>
>>
>>
>
>
>
> --
> Lisa Kwan
> lisakwan1102(at)gmail.com
> Advanced Speech Technology Lab, ASTL
>
- References
-
- [hts-users:03321] About the prototype definition of the MSD-HMM, Kwan Lisa
- [hts-users:03322] Re: About the prototype definition of the MSD-HMM, 那兴宇
- [hts-users:03324] Re: About the prototype definition of the MSD-HMM, Kwan Lisa