[hts-users:01190] How does HTK/HTS knows if the frame is voiced/unvoiced?
Dear all,
I am trying to understand how HTK/HTS knows the spaces of the frame.
In the thesis from Yoshimura, page 22-28, it seems that each observation
includes the parameters (e.g. mcp, lf0) and also the spaces that
apply to the frame. He calls the space selector, S(o_t).
I would expect that for every frame (assuming there are not deltas
features),
we code the S(o) information:
voiced frames => mcp and lf0
unvoiced frames => mcp
However, in the cmp files, the only information that signals if the
frame is voiced or not,
is the lf0 value itself: for unvoiced frames the value is log(0).
My question is: how do HTK/HTS identifies the indexes of the observation?
Does it identify the log(0) value ?
I have seen that the prototype file (proto/*) describes which
streams are or not MSD, but still I don't know how each particular frame
is identified.
I am asking this question because we want to use different spectral
parameters (and different vector sizes)
for voiced and unvoiced frames.
Thank you in advance for your time.
Antonio
- Follow-Ups
-
- [hts-users:01191] Re: How does HTK/HTS knows if the frame is voiced/unvoiced?, Thomas WANG
- [hts-users:01193] Re: How does HTK/HTS knows if the frame is voiced/unvoiced?, Heiga ZEN (Byung Ha CHUN)