[hts-users:00368] Re: my speech
- Subject: [hts-users:00368] Re: my speech
- From: "Heiga ZEN (Byung Ha CHUN)" <zen@xxxxxxxxxxxxxxxx>
- Date: Wed, 12 Jul 2006 18:35:42 +0900
Hi
liu lei wrote:
I have used HTS to generate chinese , I use syllables to construct my HMM
models, and in the course of synthesizing,
I found some questions.
1. about the list
in the directory of "lists", the full.list allways has some wrong information.
for example
sil-liu+l/A:.....
iu-lei+sil/A:.....
sil-da+l/A.......a-lian+l/A..........
sil-da+l/A.........
in the example,
between "sil-da+l/A:.." and "a-lian+l/A" , there is no "enter" ,
they are in the same line, but they are different syllable models.
I use the "makefile" in the "HTS-demo_NIT-ATR503-M001" ,
and make no modification.
They are automatically generated from your full.mlf, so please check your mlf whether it includes such lines or not.
2.some puzzles about f0
I use tcl/snack to get f0s for HTS. I write a tcl script,
and set framelength=0.025 that is for mel to get f0s.
But ,when I use them to generate speech, the results is too bad.
I find int the course of extracting mel, HTS needs
$sampfreq = 16000;
$framelength = 0.025;
$frameshift = 0.005;
$windowtype = 0;
$normtype = 1;
$FFTLength = 512;
$freqwarp = 0.42;
$mceporder = MCEPORDER;
but my course of extracting f0 only uses the framelength,
and set it the same as mel'framelength.
I want to know that
is it necessary to set frameshift and other options for geting f0?
I think frame shift for f0 extraction have to be equal to that of mel-cepstral analysis.
And you should optimize f0 search range to avoid half/double pitch.
3.about my speech
I guess it is the f0 that cause my speech's unclear,
so I use "pda" and "tcl/snack" to get f0 respectively,
but I don't get better result.
Any other factors can effect speech' s articulation?
Have you ever tried to extract f0s from CMU ARCTIC databases (HTS-demo) using your pda/get_f0 and trained HTS?
I think comparing HTSs trained using f0s included in the database and extracted by your tools will show whether f0 extraction method causes your problem or not.
Regards,
Heiga Zen (Byung Ha Chun)
--
------------------------------------------------
Heiga ZEN (in Japanese pronunciation)
Byung Ha CHUN (in Korean pronunciation)
Department of Computer Science and Engineering
Nagoya Institute of Technology
Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan
http://kt-lab.ics.nitech.ac.jp/~zen
------------------------------------------------
- Follow-Ups
-
- [hts-users:00369] Re: my speech, 刘 磊
- References
-
- [hts-users:00367] my speech, 刘 磊