[hts-users:00081] Re: f0 extraction using pda on raw sound files
- Subject: [hts-users:00081] Re: f0 extraction using pda on raw sound files
- From: "Heiga Zen (Byung-Ha Chun)" <zen@xxxxxxxxxxxxxxxx>
- Date: Tue, 07 Dec 2004 00:29:56 -0500
- Organization: Nagoya Institute of Technology, Japan
- User-agent: Mozilla Thunderbird 0.8 (Windows/20040913)
Hi Anders,
Anders Lundgren wrote:
For instance, what low/high freq boundaries should be set
when extracting a male voice?
Such kind of parameters depend on data.
Also, are there any other parameters (ie
voiced/voiceless treshold) that could affect HTS performance?
They would affect the performance.
I always check the quality of analysis/synthesis (mel-cepstral vocoder)
speech of original data to determine f0 extraction parameters.
I created som f0 files using this utility and packed them to a binary
float little endian, but I receive the "ViterbiAlign: No path found in
8'th segment" when training reaches "sil" (silence). I have successfully
trained using the exact same data, but with f0 contours taken from the
KTH "Snack" f0 extraction tool. The problem then is that many segments
in sentence-final position becomes partially unvoiced though there is no
evidence for this in the training data.
Could you count the number of voiced/unvoiced frames assigned to segment
"sil" in whole training data?
Best regards,
Heiga Zen (Byung-Ha Chun)
--
------------------------------------------------
Heiga Zen (in Japanese pronunciation)
Byung-Ha Chun (in Korean pronunciation)
Department of Computer Science and Engineering
Graduate School of Engineering
Nagoya Institute of Technology
Japan
e-mail: zen@xxxxxxxxxxxxxxxx
web: http://kt-lab.ics.nitech.ac.jp/~zen
------------------------------------------------
- References
-
- [hts-users:00080] f0 extraction using pda on raw sound files, Anders Lundgren