[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03301] Re: How to improve performance of TTS without manual phoneme alignment


Maybe it was the diversity of my corpus resulted in the poor
performance. My corpus consists of voices of 300 speakers and about
200 sentences (about 10 mins) for each speaker. This corpus is not
labeled manually. I built a average voice model using about 3000
sentences, then I used sentences of one speaker to perform adaptation.
At the first place I thought the alignment accuracy was the reason why
performance was bad. But I think you are right. Embedded training
procedure doesn't rely on the phoneme boundary. Now I'm not sure what
kind of method could improve the performance of the TTS system. I'm
afraid that the number of the adaptation sentence is too small;
however I only have 200 sentences for each speaker.

2012/5/13 huanliangwang <huanliangwang@xxxxxxx>:
> Hi,
>     Our experiments show the same conclusion as you. The effect of initial
> phone boundary to the synthesis results is very limited if you use the
> embedded training procedure. I think the consistency between mono phone
> sequence and  full-context phone sequence, as well as in training stage and
> in synthesis stage,  may be more important for synthesis performance.
>
> Best Regards,
>
> hlwang
>
>
>
> At 2012-05-13 10:05:26,"那兴宇" <nxy-yzqs@xxxxxxx> wrote:
>
> My opinion:
> HTS system use phoneme boundaries only in the initialization stage of
> monophone models, which will affect the result of convergences in the
> embedded training of full-context models. But in my experiments, phoneme
> alignment does not affect that much. So I do not know what is your corpus
> size, maybe using more training data would help.
> --
> Xingyu Na (那兴宇)
> Beijing Institute of Technology
> naxy(at)bit.edu.cn
> asr.naxingyu(at)gmail.com
> naxingyu at {facebook, twitter, linkedin}
>
>
> At 2012-05-13 03:29:51,"Kwan Lisa" <lisakwan1102@xxxxxxxxx> wrote:
>>Hi,
>>
>>I'm using a corpus without manual phoneme alignment. Thus, I performed
>>forced alignment to get the phoneme boundary information.
>>However, the performance of the TTS system was not good. TTS system
>>seems to be very sensitive to the accuracy of the phoneme boundary
>>information.
>>Is there any method that could improve the performance of TTS without
>>manual phoneme alignment?
>>
>>--
>>Lisa Kwan
>>lisakwan1102(at)gmail.com
>>
>
>
>
>
>



-- 
Lisa Kwan
lisakwan1102(at)gmail.com

Follow-Ups
[hts-users:03310] Re: How to improve performance of TTS without manual phoneme alignment, Junichi Yamagishi
References
[hts-users:03295] How to improve performance of TTS without manual phoneme alignment, Kwan Lisa
[hts-users:03299] Re: How to improve performance of TTS without manual phoneme alignment, 那兴宇
[hts-users:03300] Re: How to improve performance of TTS without manual phoneme alignment, huanliangwang