[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:01190] How does HTK/HTS knows if the frame is voiced/unvoiced?



Dear all,

I am trying to understand how HTK/HTS knows the spaces of the frame.

In the thesis from Yoshimura, page 22-28, it seems that each observation
includes the parameters (e.g. mcp, lf0) and also the spaces that
apply to the frame. He calls the space selector, S(o_t).

I would expect that for every frame (assuming there are not deltas features),
we code the S(o) information:
   voiced frames => mcp and lf0
   unvoiced frames => mcp

However, in the cmp files, the only information that signals if the frame is voiced or not,
is the lf0 value itself: for unvoiced frames the value is log(0).

My question is: how do HTK/HTS identifies the indexes of the observation?
Does it identify the log(0) value ?
I have seen that the prototype file (proto/*) describes which
streams are or not MSD, but still I don't know how each particular frame is identified.

I am asking this question because we want to use different spectral parameters (and different vector sizes)
for voiced and unvoiced frames.


Thank you in advance for your time.
Antonio


Follow-Ups
[hts-users:01191] Re: How does HTK/HTS knows if the frame is voiced/unvoiced?, Thomas WANG
[hts-users:01193] Re: How does HTK/HTS knows if the frame is voiced/unvoiced?, Heiga ZEN (Byung Ha CHUN)