[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03300] Re: How to improve performance of TTS without manual phoneme alignment


Hi,
    Our experiments show the same conclusion as you. The effect of initial phone boundary to the synthesis results is very limited if you use the embedded training procedure. I think the consistency between mono phone sequence and  full-context phone sequence, as well as in training stage and in synthesis stage,  may be more important for synthesis performance.     

Best Regards,

hlwang


At 2012-05-13 10:05:26,"那兴宇" <nxy-yzqs@xxxxxxx> wrote:
My opinion:
HTS system use phoneme boundaries only in the initialization stage of monophone models, which will affect the result of convergences in the embedded training of full-context models. But in my experiments, phoneme alignment does not affect that much. So I do not know what is your corpus size, maybe using more training data would help.
--
Xingyu Na (那兴宇)
Beijing Institute of Technology
naxy(at)bit.edu.cn
asr.naxingyu(at)gmail.com
naxingyu at {facebook, twitter, linkedin}

At 2012-05-13 03:29:51,"Kwan Lisa" <lisakwan1102@xxxxxxxxx> wrote: >Hi, > >I'm using a corpus without manual phoneme alignment. Thus, I performed >forced alignment to get the phoneme boundary information. >However, the performance of the TTS system was not good. TTS system >seems to be very sensitive to the accuracy of the phoneme boundary >information. >Is there any method that could improve the performance of TTS without >manual phoneme alignment? > >-- >Lisa Kwan >lisakwan1102(at)gmail.com >





Follow-Ups
[hts-users:03301] Re: How to improve performance of TTS without manual phoneme alignment, Kwan Lisa
References
[hts-users:03295] How to improve performance of TTS without manual phoneme alignment, Kwan Lisa
[hts-users:03299] Re: How to improve performance of TTS without manual phoneme alignment, 那兴宇