[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00081] Re: f0 extraction using pda on raw sound files


Hi Anders,

Anders Lundgren wrote:

For instance, what low/high freq boundaries should be set when extracting a male voice?

Such kind of parameters depend on data.

Also, are there any other parameters (ie voiced/voiceless treshold) that could affect HTS performance?

They would affect the performance.
I always check the quality of analysis/synthesis (mel-cepstral vocoder) speech of original data to determine f0 extraction parameters.

I created som f0 files using this utility and packed them to a binary float little endian, but I receive the "ViterbiAlign: No path found in 8'th segment" when training reaches "sil" (silence). I have successfully trained using the exact same data, but with f0 contours taken from the KTH "Snack" f0 extraction tool. The problem then is that many segments in sentence-final position becomes partially unvoiced though there is no evidence for this in the training data.

Could you count the number of voiced/unvoiced frames assigned to segment "sil" in whole training data?

Best regards,

Heiga Zen (Byung-Ha Chun)

--
 ------------------------------------------------
  Heiga Zen     (in Japanese pronunciation)
  Byung-Ha Chun (in Korean pronunciation)

  Department of Computer Science and Engineering
  Graduate School of Engineering
  Nagoya Institute of Technology
  Japan

  e-mail: zen@xxxxxxxxxxxxxxxx
     web: http://kt-lab.ics.nitech.ac.jp/~zen
 ------------------------------------------------


References
[hts-users:00080] f0 extraction using pda on raw sound files, Anders Lundgren