[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:02576] Is it OK to use a few and long speech files for HTS training?


Dear all.

I am using 19 audio-book files (about 8 minutes for each file => total 160 minutes) spoken by a single speaker, who is a native speaker of American English.
audio file format has been changed to raw
utt files were created by festival.
HTS versions I tried were 2.1.1 and 2.1 (both had same problem)
The problem I have now is following
In the initialization and reestimation ste, the number of observation sequences does not match with the number of corresponding phoneme in the audio-book data that I am using.
I compared it with what I got by running HTS-demo of using cmu_us_arctic_slt database.
When I used cmu_us_arctic_slt database, the number of a certain phoneme (ex: aa, oy) was same to the number of the observation sequences loaded for initialization and reestimation step.

I am wondering if this problem happened because of the size of utt files (It is too big and long. So *.lab files are also big and long)
Or, would it be happened because of the file name that I set? (I uses 01-02.utt, 01-03.utt, ... 01-10.utt, 02-01.utt, ..., 02.10.utt)

I would appreciate if you would give me some advice on this problem.

thank you in advance

jangwon