[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00368] Re: my speech

Subject: [hts-users:00368] Re: my speech
From: "Heiga ZEN (Byung Ha CHUN)" <zen@xxxxxxxxxxxxxxxx>
Date: Wed, 12 Jul 2006 18:35:42 +0900

Hi

liu lei wrote:

I have used HTS to generate chinese , I use syllables to construct my HMMmodels, and in the course of synthesizing,

I found some questions.
1. about the list
 in the  directory of "lists", the  full.list allways has some wrong information.
for example
sil-liu+l/A:.....
iu-lei+sil/A:.....
sil-da+l/A.......a-lian+l/A..........
sil-da+l/A.........
in the example,
between  "sil-da+l/A:.."  and "a-lian+l/A"  , there is no "enter" ,
they are in the same line, but they are different syllable models.
I use the "makefile"  in the "HTS-demo_NIT-ATR503-M001" ,
and make no  modification.


They are automatically generated from your full.mlf, so please check your mlf whether it includes such lines or not.

2.some puzzles about f0
I use  tcl/snack to get f0s for HTS. I write a tcl script,
and  set   framelength=0.025 that is for mel  to get f0s.
But ,when I use them to generate speech, the results is too bad.
I find int the course of extracting mel, HTS  needs
$sampfreq = 16000;$framelength = 0.025;$frameshift = 0.005;$windowtype = 0;$normtype = 1;$FFTLength = 512;$freqwarp = 0.42;$mceporder = MCEPORDER;but my course of extracting f0 only uses the framelength,
and set it the same as mel'framelength.
I want to know  that
is it necessary to set frameshift and other options for geting f0?


I think frame shift for f0 extraction have to be equal to that of mel-cepstral analysis.
And you should optimize f0 search range to avoid half/double pitch.

3.about my speech
I guess  it is the f0 that cause my speech's unclear,
so I use "pda" and "tcl/snack" to get f0 respectively,
but I don't get better result.
Any other factors can  effect speech' s articulation?


Have you ever tried to extract f0s from CMU ARCTIC databases (HTS-demo) using your pda/get_f0 and trained HTS?

I think comparing HTSs trained using f0s included in the database and extracted by your tools will show whether f0 extraction method causes your problem or not.

Regards,

Heiga Zen (Byung Ha Chun)

--
------------------------------------------------
Heiga ZEN     (in Japanese pronunciation)
Byung Ha CHUN (in Korean pronunciation)

Department of Computer Science and Engineering
Nagoya Institute of Technology
Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan

http://kt-lab.ics.nitech.ac.jp/~zen
------------------------------------------------

Follow-Ups
: [hts-users:00369] Re: my speech, 刘磊

References
: [hts-users:00367] my speech, 刘磊

Prev by Subject: [hts-users:00367] my speech
Next by Subject: [hts-users:00369] Re: my speech
Previous by thread: [hts-users:00367] my speech
Next by thread: [hts-users:00369] Re: my speech