[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00195] Re: diffrences between merge and cat


Hi,

liulei_198216@xxx wrote:

1.HTS voice which contains hts_engine is registered by \festvox\cstr_us_ked_timit_hts.scm
 that file tells festival where is the hts voice
 for example  the following command change the current voice :
 festival > (voice_cstr_us_ked_timit_hts)

Yes.

2.\festvox\cstr_us_ked_timit_hts.scm  loads \hts\hts.scm ,
\hts\hts.scm calls \hts\htsvoice.pl which uses hts_engine to synthesize speech.

Yes.

3.\hts\hts.scm also defines some variables by which we can modify speech feature, such as f0.

Yes.

 In the \hts\htsvoice.pl , I have found that hts_engine synthesizes speech
 using .pdf   fles,
does the hts_engine have the function of generating parameters from .pdf files or the .pdf files are already the parameters format which can be used to synthesize speech.

In .pdf files, statistics of HMMs are stored.
For example, mcep.pdf contains statistics of mel-cepstrum stream of HMMs.
After loading these pdf files and decision trees (.inf files), hts_engine composes a sentence HMM corresponding to a given label sequence. Then speech parameter sequences (mel-cepstrum and f0 sequences) are generated from the sentence HMM using the speech parameter generation algorithm (please see mlpg.c in hts_engine release).
Finally, a speech waveform is synthesized from the generated speech parameter sequences using the MLSA filter.

 In the \festvox director
 there are some other files except cstr_us_ked_timit_hts.scm  such as

 cstr_us_ked_timit_hts_phoneset.scm
 cstr_us_ked_timit_hts_tokenizer.scm
 cstr_us_ked_timit_hts_tagger.scm
 cstr_us_ked_timit_hts_lexicon.scm
 cstr_us_ked_timit_hts_phrasing.scm
 cstr_us_ked_timit_hts_intonation.scm
 cstr_us_ked_timit_hts_duration.scm
 cstr_us_ked_timit_hts_f0model.scm
 cstr_us_ked_timit_hts_other.scm

 I have not found their effects in the process of synthesizing speech.
 Why the demo of "HTS voices for fetival"  provides them?

They are provided for defining text processing conditions (e.g., POS, phone set, etc.)

Regards,

Heiga ZEN (Byung Ha CHUN)

--
 ------------------------------------------------
  Heiga ZEN     (in Japanese pronunciation)
  Byung Ha CHUN (in Korean pronunciation)

  Department of Computer Science and Engineering
  Graduate School of Engineering
  Nagoya Institute of Technology
  Japan

  e-mail: zen@xxxxxxxxxxxxxxxx
     web: http://kt-lab.ics.nitech.ac.jp/~zen
 ------------------------------------------------



References
[hts-users:00194] Re: diffrences between merge and cat, 刘 磊