[hts-users:00195] Re: diffrences between merge and cat
- Subject: [hts-users:00195] Re: diffrences between merge and cat
- From: "Heiga ZEN (Byung Ha CHUN)" <zen@xxxxxxxxxxxxxxxx>
- Date: Wed, 11 Jan 2006 12:48:22 +0900
Hi,
liulei_198216@xxx wrote:
1.HTS voice which contains hts_engine is registered by
\festvox\cstr_us_ked_timit_hts.scm
that file tells festival where is the hts voice
for example the following command change the current voice :
festival > (voice_cstr_us_ked_timit_hts)
Yes.
2.\festvox\cstr_us_ked_timit_hts.scm loads \hts\hts.scm ,
\hts\hts.scm calls \hts\htsvoice.pl which uses hts_engine to
synthesize speech.
Yes.
3.\hts\hts.scm also defines some variables by which we can modify speech
feature, such as f0.
Yes.
In the \hts\htsvoice.pl , I have found that hts_engine synthesizes speech
using .pdf fles,
does the hts_engine have the function of generating parameters from
.pdf files or the .pdf files are already the parameters format which
can be used to synthesize speech.
In .pdf files, statistics of HMMs are stored.
For example, mcep.pdf contains statistics of mel-cepstrum stream of HMMs.
After loading these pdf files and decision trees (.inf files), hts_engine composes a sentence HMM corresponding to a
given label sequence.
Then speech parameter sequences (mel-cepstrum and f0 sequences) are generated from the sentence HMM using the speech
parameter generation algorithm (please see mlpg.c in hts_engine release).
Finally, a speech waveform is synthesized from the generated speech parameter sequences using the MLSA filter.
In the \festvox director
there are some other files except cstr_us_ked_timit_hts.scm such as
cstr_us_ked_timit_hts_phoneset.scm
cstr_us_ked_timit_hts_tokenizer.scm
cstr_us_ked_timit_hts_tagger.scm
cstr_us_ked_timit_hts_lexicon.scm
cstr_us_ked_timit_hts_phrasing.scm
cstr_us_ked_timit_hts_intonation.scm
cstr_us_ked_timit_hts_duration.scm
cstr_us_ked_timit_hts_f0model.scm
cstr_us_ked_timit_hts_other.scm
I have not found their effects in the process of synthesizing speech.
Why the demo of "HTS voices for fetival" provides them?
They are provided for defining text processing conditions (e.g., POS, phone set, etc.)
Regards,
Heiga ZEN (Byung Ha CHUN)
--
------------------------------------------------
Heiga ZEN (in Japanese pronunciation)
Byung Ha CHUN (in Korean pronunciation)
Department of Computer Science and Engineering
Graduate School of Engineering
Nagoya Institute of Technology
Japan
e-mail: zen@xxxxxxxxxxxxxxxx
web: http://kt-lab.ics.nitech.ac.jp/~zen
------------------------------------------------
- References
-
- [hts-users:00194] Re: diffrences between merge and cat, 刘 磊