[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:01238] Re: Multispace probability distribution!


Hi,

Lefteris Banos wrote (2008/03/17 22:51):

I would like to ask, why modeling F0 plus derivatives with multi-space
probability distribution, hts uses 3 different separated streams of
one-dimension (f0,df0,ddf0) and not using one stream for all of
three-dimension ?

Usually delta and delta^2 log f0 are calculated as follows:

delta f0 = 0.5*f0_{t+1} - 0.5*f0_{t-1}
delta^2 f0 = f0_{t+1} - 2*f0_t + f0_{t-1}

On voiced/unvoiced boundaries, we cannot calculate delta and delta^2 f0 because f0_{t+1} or f0_{t-1} is unvoiced.
Therefore, static, delta and delta^2 f0 on voiced, unvoiced, and voiced/unvoiced boundaries become as follows:

                                            MSD space
            within voiced region   voiced/unvoiced boundaries   within unvoiced region
f0           continuous             continuous                   discrete
delta f0     continuous             discrete                     discrete
delta^2 f0   continuous             discrete                     discrete

If we want to use an MSD with one discrete space for unvoiced frames and one (or more) continuous space(s) for voiced frames, the above kind of observation sequence cannot be modeled because frames on voiced/unvoiced boundaries are 1-dimensional while frames in voiced regions are 3-dimensional.

To avoid this problem, we use 3-stream structure for modeling F0.
In this case, f0, delta f0, delta-2 f0 are modeled by 3 independent data streams.
We can model the above observation sequence using an MSD with one discrete space for unvoiced frames and 1-dimensional continuous space(s) for voiced frames.

In case of using one stream it would be correct to
model f0, or you suggest to use the same structure anyway?

It depends on the property of your new feature.
If your feature has the similar property as F0, I think you should use the same structure.
Otherwise, model it using 1-stream.

I am interested about that because i want to use another stream structure to
model the HMMs, but using that structure for lf0 (3 streams)i will have
more than  6-7 streams  which maybe make the training unstable!

Why did you think using 6-7 streams make the training unstable?
I cannot understand what you mean.

Regards,

Heiga ZEN (Byung Ha CHUN)

--
------------------------------------------------
Heiga ZEN     (in Japanese pronunciation)
Byung Ha CHUN (in Korean pronunciation)

Department of Computer Science and Engineering
Nagoya Institute of Technology
Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan

http://www.sp.nitech.ac.jp/~zen
------------------------------------------------

Follow-Ups
[hts-users:01243] Re: Multispace probability distribution!, Lefteris Banos
References
[hts-users:01233] Multispace probability distribution!, Lefteris Banos