[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00774] Re: Questions on training and flat pitch pattern

Subject: [hts-users:00774] Re: Questions on training and flat pitch pattern
From: "Alexander Gutkin" <alexander.gutkin@xxxxxxxxx>
Date: Wed, 8 Aug 2007 16:39:26 +0100
Cc: "Junichi Yamagishi" <jyamagis@xxxxxxxxxxxx>
Delivered-to: hts-users@xxxxxxxxxxxxxxx
Dkim-signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=qq9EovRZgGGINStlDRV5h+3Xurdx9U4eC8ZUQaZE/QrZimoBdElGongjgCT17OobdMtPFW7JPrd1ae2LRQs/tp4jo4LxqnIXk7H5dgJBk5L3mWi0txacSXSRZ0aTHvZXA5Ha6RuC5/ZgtwM9rofAnq4JoKMTRtHWfNJVJnvZVww=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:cc:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=KJIYE8s1pwNQlydnKtJPL4/HHWMF+sLgVD2KDK2AcPic3e3m2YI4DO9ITOAWKdDBl6IkReOVzIbdyWRWsqFOTK6rd29Q8zxH2f6TgykaXrQCEy3+/rwICcNd7M0qByJz5q7EsJnUDjOZP1supFT6+1HDvie+IcrfJiCXVTiepRA=

I suppose the real question here is how to calculate the optimal
statistical coverage of a given corpus and try to figure out how the
``saturation'' curve will look... I presume that for any given model,
the number of observations reaches a certain optimal N after which the
improvement is negligible (possibly similar to ASR)...

On 8/8/07, Junichi Yamagishi <jyamagis@xxxxxxxxxxxx> wrote:
> Hi,
>
> On 2007/08/07, at 22:22, Alexander Gutkin wrote:
> >>> I've trained HMMs for HMM-based speech synthesis from 23,000
> >>> sentences.
> >>
> >> Oh, so much, how many hours is it? I don't believe it had manual
> >> segmentation :)
> >> Did you noticed any real improvement with so big database, for
> >> example
> >> Alan says that 500-1000 are enough.
> >>
> >
> >     I assume this was used by Junichi was the adaptation work, i.e.
> > training the speaker-independent models. But, in general, I can't see
> > anything wrong with training on a large database - as long as you can
> > afford the resources... Probably there is no perfect recepy for the
> > optimal database size - it's too speaker and coverage-specific...
>
> Yes. The HMMs are speaker-independent model. But, the clustering
> procedures
> are the same as those of speaker-dependent model.
>
> In addition to the speaker-independent models, I've trained several
> speaker-dependent HMMs from about 1 hour to about 13 hours of speech
> data.
> Then, I've heard that Dr.Zen and Dr.Toda utilized much larger speech
> corpora
> for training speaker-dependent HMMs in the past.
>
> In my opinion, synthetic speech generated from HMMs using more than 3
> hours of
> speech data sounds very good. It's much better than that from 1 hour
> of speech data.
>
> Thus I agree Dr.Zen's comments. I do not think that 500-1000
> sentences are enough at all.
> My impression is reversely that
> "As the number of training data increases, the quality of synthetic
> speech becomes better!"
>
> Thanks,
> Junichi
>
>
>
>
>

References
: [hts-users:00764] Questions on training and flat pitch pattern, Lee Sillon; [hts-users:00765] Re: Questions on training and flat pitch pattern, Heiga ZEN (Byung Ha CHUN); [hts-users:00766] Re: Questions on training and flat pitch pattern, Junichi Yamagishi; [hts-users:00767] Re: Questions on training and flat pitch pattern, Nickolay V. Shmyrev; [hts-users:00769] Re: Questions on training and flat pitch pattern, Alexander Gutkin; [hts-users:00771] Re: Questions on training and flat pitch pattern, Junichi Yamagishi

Prev by Subject: [hts-users:00773] Re: Questions on training and flat pitch pattern
Next by Subject: [hts-users:00775] 答复: [hts-users:00772] Re: Questions on training and flat pitch pattern
Previous by thread: [hts-users:00771] Re: Questions on training and flat pitch pattern
Next by thread: [hts-users:00777] Re: Questions on training and flat pitch pattern