[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00369] Re: my speech


Hi,

About question 1

I have solved that ,

it is because there is no "enter" or new line in the end of full_lab file.

 

About question 2,3

I use "pda" to get f0 for the voices which are in the  CMU ARCTIC database,

and compare them with the f0 in the CMU ARCTIC database.

but there is great defference between them .

I use "pda $name.wav -L -S 0.025 -length 0.005 -fmax 200 -fmin 80| x2x +af > $name.f0 "

I will try other methods.

thank you!

Best regards

 


From:  "Heiga ZEN (Byung Ha CHUN)" <zen@xxxxxxxxxxxxxxxx>
Reply-To:  hts-users@xxxxxxxxxxxxxxxxxxxxxxxxx
To:  hts-users@xxxxxxxxxxxxxxxxxxxxxxxxx
Subject:  [hts-users:00368] Re: my speech
Date:  Wed, 12 Jul 2006 18:35:42 +0900
>Hi
>
>liu lei wrote:
>
>>I  have used HTS to generate chinese , I use syllables to construct
>>my HMM models, and  in  the  course of synthesizing,
>>I found some questions.
>>1. about the list
>>  in the  directory of "lists", the  full.list allways has some
>>wrong information.
>>for example
>>sil-liu+l/A:.....
>>iu-lei+sil/A:.....
>>sil-da+l/A.......a-lian+l/A..........
>>sil-da+l/A.........
>>in the example,
>>between  "sil-da+l/A:.."  and "a-lian+l/A"  , there is no "enter" ,
>>they are in the same line, but they are different syllable models.
>>I use the "makefile"  in the "HTS-demo_NIT-ATR503-M001" ,
>>and make no  modification.
>
>They are automatically generated from your full.mlf, so please check
>your mlf whether it includes such lines or not.
>
>>2.some puzzles about f0
>>I use  tcl/snack to get f0s for HTS. I write a tcl script,
>>and  set   framelength=0.025 that is for mel  to get f0s.
>>But ,when I use them to generate speech, the results is too bad.
>>I find int the course of extracting mel, HTS  needs
>>$sampfreq    = 16000; $framelength = 0.025; $frameshift  = 0.005;
>>$windowtype  = 0;    $normtype    = 1;     $FFTLength   = 512;  
>>$freqwarp    = 0.42;  $mceporder   = MCEPORDER;   but my course of
>>extracting f0 only uses the framelength,
>>and set it the same as mel'framelength.
>>I want to know  that
>>is it necessary to set frameshift and other options for geting f0?
>
>I think frame shift for f0 extraction have to be equal to that of
>mel-cepstral analysis.
>And you should optimize f0 search range to avoid half/double pitch.
>
>>3.about my speech
>>I guess  it is the f0 that cause my speech's unclear,
>>so I use "pda" and "tcl/snack" to get f0 respectively,
>>but I don't get better result.
>>Any other factors can  effect speech' s articulation?
>
>Have you ever tried to extract f0s from CMU ARCTIC databases
>(HTS-demo) using your pda/get_f0 and trained HTS?
>I think comparing HTSs trained using f0s included in the database
>and extracted by your tools will show whether f0 extraction method
>causes your problem or not.
>
>Regards,
>
>Heiga Zen (Byung Ha Chun)
>
>--
>------------------------------------------------
>Heiga ZEN     (in Japanese pronunciation)
>Byung Ha CHUN (in Korean pronunciation)
>
>Department of Computer Science and Engineering
>Nagoya Institute of Technology
>Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan
>
>http://kt-lab.ics.nitech.ac.jp/~zen
>------------------------------------------------
>


使用世界上最大的电子邮件系统― MSN Hotmail Get 2 months FREE*.
Follow-Ups
[hts-users:00370] Re: my speech, 刘 磊
References
[hts-users:00368] Re: my speech, Heiga ZEN (Byung Ha CHUN)