Hi,
Antonio Bonafonte wrote (2008/02/29 4:52):
In the thesis from Yoshimura, page 22-28, it seems that each observation
includes the parameters (e.g. mcp, lf0) and also the spaces that
apply to the frame. He calls the space selector, S(o_t).
I would expect that for every frame (assuming there are not deltas
features),
we code the S(o) information:
voiced frames => mcp and lf0
unvoiced frames => mcp
However, in the cmp files, the only information that signals if the
frame is voiced or not,
is the lf0 value itself: for unvoiced frames the value is log(0).
My question is: how do HTK/HTS identifies the indexes of the
observation?
Does it identify the log(0) value ?
I have seen that the prototype file (proto/*) describes which
streams are or not MSD, but still I don't know how each particular
frame is identified.
In HModel.c, you can find the following function:
/* EXPORT-> SpaceOrder: Count order of Vector which is excepted ignVal */
int SpaceOrder(Vector vec)
{
int order;
order = VectorSize(vec);
while (order != 0) {
if (vec[order] == ignoreValue) --order;
else break;
}
return order;
}
Vector vec is an observation vector for a stream.
This function counts the number of elements in vec whose value is not
equal to ignoreValue.
In default, ignoreValue = LZERO.
As you described, lf0 and its dynamic features take log(0) in unvoiced
frames.
In this case, SpaceOrder(vec) becomes 0.
It means that this observation vector belong to 0-dimensional space.
And then we use the same dimensional distributions to calculate output
probability of this observation.
This is how to handle MSD in HTS.
So SpaceOrder() works as the space selector S(o_t) in Yoshimura's
article.
You can change ignoreValue to other values via a configuration
variable of HModel, IGNOREVALUE.
Regards,
Heiga ZEN (Byung Ha CHUN)