[hts-users:00066] Re: HTS voice output problems
- Subject: [hts-users:00066] Re: HTS voice output problems
- From: "Heiga Zen (Byung-Ha Chun)" <zen@xxxxxxxxxxxxxxxx>
- Date: Sat, 14 Aug 2004 17:15:09 -0400
- Organization: Nagoya Institute of Technology, Japan
- User-agent: Mozilla Thunderbird 0.6 (Windows/20040502)
Hi Nicholas,
Nicholas Volk wrote:
I did `sox -r 16000 -w -s ms_272_16.raw ms_272_16.raw` to
give it a proper header. There's something that might be a vowel
just before 2 secs...
The new file is located at http://www.bitlips.fi/HTS/ms_272_16.wav .
I can hear it, thanks.
There's something I must be doing wrong... I'll have to dig deeper.
I blindly copied the four win/*.win files for AWB.
Could this be a problem?
These files are should be in little endian format.
Please check inside of these files by SPTK.
SPTK/bin/dmp +f ./mcep_dyn.win or SPTK/bin/dmp +f ./lf0_dyn.win
Since I have a background in Festival and know very little about
HTK, I'd like to check one thing:
The question files contain a bunch of yes/no-questions.
Yes.
Each question contains the set of regexp-like possible yes-answers.
HTK supports only '*' and '?'.
Are the label files are only used together with the questions file?
The question file included in each HTS-demo_... release is designed for
each database.
You have to add/remove questions for your database.
Are all the questions used for all the stages of training or does
some of these, say f0 modelling, use only a subset of these?
(I guess that all are used and that the /A:, /B:, /C: et cetera
are there for human-readability and not for magic.)
All of questions are used for all the stages of clustering.
I know that it isn't readable for usual people.
I sometimes got crazy when I editted it :-(
My utterance structure is different from AWB-english.
I don't have phrase relation and I most certainly don't use TOBI
intonation model for non-English (though admittedly TOBI has been applied
with success to languages other than English).
I cheated all the TOBI-related features to return '0' among other things...
It might be that I fail to create any sensible F0.
I have trained some English voices without utterance, phrase and POS
information.
They were not so good, but not so bad.
If you can use a large amount of training data, these contexts are not
required.
But it large data are not available, I recommend you to attach these
information manually and include these contexts.
Best regards
Heiga Zen (Byung-Ha Chun)
--
------------------------------------------------
Heiga Zen (in Japanese pronunciation)
Byung-Ha Chun (in Korean pronunciation)
Department of Computer Science and Engineering
Graduate School of Engineering
Nagoya Institute of Technology
Japan
e-mail: zen@xxxxxxxxxxxxxxxx
web: http://kt-lab.ics.nitech.ac.jp/~zen
------------------------------------------------
- References
-
- [hts-users:00048] Problem in Training_foo_bar.pl, Nicholas Volk
- [hts-users:00050] Re: Problem in Training_foo_bar.pl, Heiga Zen/Byung-Ha Chun
- [hts-users:00063] HTS voice output problems, Nicholas Volk
- [hts-users:00064] Re: HTS voice output problems, Heiga Zen (Byung-Ha Chun)
- [hts-users:00065] Re: HTS voice output problems, Nicholas Volk