[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00066] Re: HTS voice output problems


Hi Nicholas,

Nicholas Volk wrote:

I did `sox -r 16000 -w -s ms_272_16.raw ms_272_16.raw` to
give it a proper header. There's something that might be a vowel
just before 2 secs...
The new file is located at http://www.bitlips.fi/HTS/ms_272_16.wav .

I can hear it, thanks.

There's something I must be doing wrong... I'll have to dig deeper.

I blindly copied the four win/*.win files for AWB.
Could this be a problem?

These files are should be in little endian format.
Please check inside of these files by SPTK.
SPTK/bin/dmp +f ./mcep_dyn.win or SPTK/bin/dmp +f ./lf0_dyn.win

Since I have a background in Festival and know very little about
HTK, I'd like to check one thing:
The question files contain a bunch of yes/no-questions.

Yes.

Each question contains the set of regexp-like possible yes-answers.

HTK supports only '*' and '?'.

Are the label files are only used together with the questions file?

The question file included in each HTS-demo_... release is designed for each database.
You have to add/remove questions for your database.

Are all the questions used for all the stages of training or does
some of these, say f0 modelling, use only a subset of these?
(I guess that all are used and that the /A:, /B:, /C: et cetera
are there for human-readability and not for magic.)

All of questions are used for all the stages of clustering.
I know that it isn't readable for usual people.
I sometimes got crazy when I editted it :-(

My utterance structure is different from AWB-english.
I don't have phrase relation and I most certainly don't use TOBI
intonation model for non-English (though admittedly TOBI has been applied
with success to languages other than English).
I cheated all the TOBI-related features to return '0' among other things...
It might be that I fail to create any sensible F0.

I have trained some English voices without utterance, phrase and POS information.
They were not so good, but not so bad.
If you can use a large amount of training data, these contexts are not required. But it large data are not available, I recommend you to attach these information manually and include these contexts.

Best regards

Heiga Zen (Byung-Ha Chun)

--
 ------------------------------------------------
  Heiga Zen     (in Japanese pronunciation)
  Byung-Ha Chun (in Korean pronunciation)

  Department of Computer Science and Engineering
  Graduate School of Engineering
  Nagoya Institute of Technology
  Japan

  e-mail: zen@xxxxxxxxxxxxxxxx
     web: http://kt-lab.ics.nitech.ac.jp/~zen
 ------------------------------------------------


References
[hts-users:00048] Problem in Training_foo_bar.pl, Nicholas Volk
[hts-users:00050] Re: Problem in Training_foo_bar.pl, Heiga Zen/Byung-Ha Chun
[hts-users:00063] HTS voice output problems, Nicholas Volk
[hts-users:00064] Re: HTS voice output problems, Heiga Zen (Byung-Ha Chun)
[hts-users:00065] Re: HTS voice output problems, Nicholas Volk