Hi ,
I have used HTS to generate chinese , I use syllables
to construct my HMM models, and in the course of
synthesizing,
I found some questions.
1. about the list
in the directory of
"lists", the full.list allways has some wrong information.
for example
sil-liu+l/A:.....
iu-lei+sil/A:.....
sil-da+l/A.......a-lian+l/A..........
sil-da+l/A.........
in the example,
between "sil-da+l/A:.." and
"a-lian+l/A" , there is no "enter" ,
they are in the same line,
but they are different syllable models.
I use the "makefile" in the "HTS-demo_NIT-ATR503-M001"
,
and make no modification.
2.some puzzles about f0
I use tcl/snack to get
f0s for HTS. I write a tcl script,
and set
framelength=0.025 that is for mel to get f0s.
But ,when I use them to generate speech, the results is too
bad.
I find int the course of extracting mel, HTS
needs
$sampfreq = 16000;
$framelength =
0.025;
$frameshift = 0.005;
$windowtype =
0;
$normtype =
1;
$FFTLength =
512;
$freqwarp = 0.42;
$mceporder = MCEPORDER;
but my course of extracting f0 only uses the
framelength,
and set it the same as mel'framelength.
I want to know that
is it necessary to set
frameshift and other options for geting f0?
but snack only provides
-method m
-start start
-end end
-framelength t
-windowlength length
-maxpitch
val
-minpitch val
-progress callback
I aso use "pda" to get f0.
when I use
them to construct HMMs and synthesize speech,
the speech's quality is
also low.
3.about my speech
I guess it is the f0 that
cause my speech's unclear,
so I use "pda" and "tcl/snack" to get f0
respectively,
but I don't get better result.
Any other factors can effect speech' s
articulation?
thank you.
Best regards