Hi, On 2007/08/07, at 22:22, Alexander Gutkin wrote:
I've trained HMMs for HMM-based speech synthesis from 23,000 sentences.Oh, so much, how many hours is it? I don't believe it had manual segmentation :)Did you noticed any real improvement with so big database, for exampleAlan says that 500-1000 are enough.I assume this was used by Junichi was the adaptation work, i.e. training the speaker-independent models. But, in general, I can't see anything wrong with training on a large database - as long as you can afford the resources... Probably there is no perfect recepy for the optimal database size - it's too speaker and coverage-specific...
Yes. The HMMs are speaker-independent model. But, the clustering procedures
are the same as those of speaker-dependent model. In addition to the speaker-independent models, I've trained severalspeaker-dependent HMMs from about 1 hour to about 13 hours of speech data. Then, I've heard that Dr.Zen and Dr.Toda utilized much larger speech corpora
for training speaker-dependent HMMs in the past.In my opinion, synthetic speech generated from HMMs using more than 3 hours of speech data sounds very good. It's much better than that from 1 hour of speech data.
Thus I agree Dr.Zen's comments. I do not think that 500-1000 sentences are enough at all.
My impression is reversely that"As the number of training data increases, the quality of synthetic speech becomes better!"
Thanks, Junichi