[hts-users:00774] Re: Questions on training and flat pitch pattern
- Subject: [hts-users:00774] Re: Questions on training and flat pitch pattern
- From: "Alexander Gutkin" <alexander.gutkin@xxxxxxxxx>
- Date: Wed, 8 Aug 2007 16:39:26 +0100
- Cc: "Junichi Yamagishi" <jyamagis@xxxxxxxxxxxx>
- Delivered-to: hts-users@xxxxxxxxxxxxxxx
- Dkim-signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=qq9EovRZgGGINStlDRV5h+3Xurdx9U4eC8ZUQaZE/QrZimoBdElGongjgCT17OobdMtPFW7JPrd1ae2LRQs/tp4jo4LxqnIXk7H5dgJBk5L3mWi0txacSXSRZ0aTHvZXA5Ha6RuC5/ZgtwM9rofAnq4JoKMTRtHWfNJVJnvZVww=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=KJIYE8s1pwNQlydnKtJPL4/HHWMF+sLgVD2KDK2AcPic3e3m2YI4DO9ITOAWKdDBl6IkReOVzIbdyWRWsqFOTK6rd29Q8zxH2f6TgykaXrQCEy3+/rwICcNd7M0qByJz5q7EsJnUDjOZP1supFT6+1HDvie+IcrfJiCXVTiepRA=
I suppose the real question here is how to calculate the optimal
statistical coverage of a given corpus and try to figure out how the
``saturation'' curve will look... I presume that for any given model,
the number of observations reaches a certain optimal N after which the
improvement is negligible (possibly similar to ASR)...
On 8/8/07, Junichi Yamagishi <jyamagis@xxxxxxxxxxxx> wrote:
> Hi,
>
> On 2007/08/07, at 22:22, Alexander Gutkin wrote:
> >>> I've trained HMMs for HMM-based speech synthesis from 23,000
> >>> sentences.
> >>
> >> Oh, so much, how many hours is it? I don't believe it had manual
> >> segmentation :)
> >> Did you noticed any real improvement with so big database, for
> >> example
> >> Alan says that 500-1000 are enough.
> >>
> >
> > I assume this was used by Junichi was the adaptation work, i.e.
> > training the speaker-independent models. But, in general, I can't see
> > anything wrong with training on a large database - as long as you can
> > afford the resources... Probably there is no perfect recepy for the
> > optimal database size - it's too speaker and coverage-specific...
>
> Yes. The HMMs are speaker-independent model. But, the clustering
> procedures
> are the same as those of speaker-dependent model.
>
> In addition to the speaker-independent models, I've trained several
> speaker-dependent HMMs from about 1 hour to about 13 hours of speech
> data.
> Then, I've heard that Dr.Zen and Dr.Toda utilized much larger speech
> corpora
> for training speaker-dependent HMMs in the past.
>
> In my opinion, synthetic speech generated from HMMs using more than 3
> hours of
> speech data sounds very good. It's much better than that from 1 hour
> of speech data.
>
> Thus I agree Dr.Zen's comments. I do not think that 500-1000
> sentences are enough at all.
> My impression is reversely that
> "As the number of training data increases, the quality of synthetic
> speech becomes better!"
>
> Thanks,
> Junichi
>
>
>
>
>
- References
-
- [hts-users:00764] Questions on training and flat pitch pattern, Lee Sillon
- [hts-users:00765] Re: Questions on training and flat pitch pattern, Heiga ZEN (Byung Ha CHUN)
- [hts-users:00766] Re: Questions on training and flat pitch pattern, Junichi Yamagishi
- [hts-users:00767] Re: Questions on training and flat pitch pattern, Nickolay V. Shmyrev
- [hts-users:00769] Re: Questions on training and flat pitch pattern, Alexander Gutkin
- [hts-users:00771] Re: Questions on training and flat pitch pattern, Junichi Yamagishi