[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:01193] Re: How does HTK/HTS knows if the frame is voiced/unvoiced?


Hi,

Antonio Bonafonte wrote (2008/02/29 4:52):

In the thesis from Yoshimura, page 22-28, it seems that each observation
includes the parameters (e.g. mcp, lf0) and also the spaces that
apply to the frame. He calls the space selector, S(o_t).

I would expect that for every frame (assuming there are not deltas features),
we code the S(o) information:
   voiced frames => mcp and lf0
   unvoiced frames => mcp

However, in the cmp files, the only information that signals if the frame is voiced or not,
is the lf0 value itself: for unvoiced frames the value is log(0).

My question is: how do HTK/HTS identifies the indexes of the observation?
Does it identify the log(0) value ?
I have seen that the prototype file (proto/*) describes which
streams are or not MSD, but still I don't know how each particular frame is identified.

In HModel.c, you can find the following function:

/* EXPORT-> SpaceOrder: Count order of Vector which is excepted ignVal */
int SpaceOrder(Vector vec)
{
 int order;

 order = VectorSize(vec);

 while (order != 0) {
    if (vec[order] == ignoreValue) --order;
    else break;
 }

 return order;
}

Vector vec is an observation vector for a stream.
This function counts the number of elements in vec whose value is not equal to ignoreValue.
In default, ignoreValue = LZERO.
As you described, lf0 and its dynamic features take log(0) in unvoiced frames.
In this case, SpaceOrder(vec) becomes 0.
It means that this observation vector belong to 0-dimensional space.
And then we use the same dimensional distributions to calculate output probability of this observation.
This is how to handle MSD in HTS.
So SpaceOrder() works as the space selector S(o_t) in Yoshimura's article.
You can change ignoreValue to other values via a configuration variable of HModel, IGNOREVALUE.

Regards,

Heiga ZEN (Byung Ha CHUN)

--
------------------------------------------------
Heiga ZEN     (in Japanese pronunciation)
Byung Ha CHUN (in Korean pronunciation)

Department of Computer Science and Engineering
Nagoya Institute of Technology
Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan

http://www.sp.nitech.ac.jp/~zen
------------------------------------------------

Follow-Ups
[hts-users:01195] Re: How does HTK/HTS knows if the frame is voiced/unvoiced?, Antonio Bonafonte
References
[hts-users:01190] How does HTK/HTS knows if the frame is voiced/unvoiced?, Antonio Bonafonte