[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:01195] Re: How does HTK/HTS knows if the frame is voiced/unvoiced?



Thank you very much, Heiga, this is exactly what I wanted to know.
(And also thank you Thomas, for the try).

Antonio

Heiga ZEN (Byung Ha CHUN) wrote:
Hi,

Antonio Bonafonte wrote (2008/02/29 4:52):

In the thesis from Yoshimura, page 22-28, it seems that each observation
includes the parameters (e.g. mcp, lf0) and also the spaces that
apply to the frame. He calls the space selector, S(o_t).

I would expect that for every frame (assuming there are not deltas features),
we code the S(o) information:
   voiced frames => mcp and lf0
   unvoiced frames => mcp

However, in the cmp files, the only information that signals if the frame is voiced or not,
is the lf0 value itself: for unvoiced frames the value is log(0).

My question is: how do HTK/HTS identifies the indexes of the observation?
Does it identify the log(0) value ?
I have seen that the prototype file (proto/*) describes which
streams are or not MSD, but still I don't know how each particular frame is identified.

In HModel.c, you can find the following function:

/* EXPORT-> SpaceOrder: Count order of Vector which is excepted ignVal */
int SpaceOrder(Vector vec)
{
 int order;

 order = VectorSize(vec);

 while (order != 0) {
    if (vec[order] == ignoreValue) --order;
    else break;
 }

 return order;
}

Vector vec is an observation vector for a stream.
This function counts the number of elements in vec whose value is not equal to ignoreValue.
In default, ignoreValue = LZERO.
As you described, lf0 and its dynamic features take log(0) in unvoiced frames.
In this case, SpaceOrder(vec) becomes 0.
It means that this observation vector belong to 0-dimensional space.
And then we use the same dimensional distributions to calculate output probability of this observation.
This is how to handle MSD in HTS.
So SpaceOrder() works as the space selector S(o_t) in Yoshimura's article. You can change ignoreValue to other values via a configuration variable of HModel, IGNOREVALUE.

Regards,

Heiga ZEN (Byung Ha CHUN)



--
_______________________________________________________________________

Antonio Bonafonte
Departament de Teoria del Senyal i Comunicacions
TALP Research Center
Universitat Politècnica de Catalunya     Tel: +34-93 401 0764 (-6440)
Campus Nord, Edifici D5                  Fax: +34-93 401 6447
c/ Jordi Girona 1-3                      e-mail: antonio.bonafonte@xxxxxxx
08034 Barcelona, SPAIN                   http://gps-tsc.upc.es/veu/
_______________________________________________________________________


Follow-Ups
[hts-users:01196] Error: LoadModelFiles: #duration pdf must be positive value., Tóth Bálint
References
[hts-users:01190] How does HTK/HTS knows if the frame is voiced/unvoiced?, Antonio Bonafonte
[hts-users:01193] Re: How does HTK/HTS knows if the frame is voiced/unvoiced?, Heiga ZEN (Byung Ha CHUN)