My opinion:
HTS system use phoneme boundaries only in the initialization stage of monophone models, which will affect the result of convergences in the embedded training of full-context models. But in my experiments, phoneme alignment does not affect that much. So I do not know what is your corpus size, maybe using more training data would help.
--
Xingyu Na (那兴宇)
Beijing Institute of Technology
naxy(at)bit.edu.cn
asr.naxingyu(at)gmail.com
naxingyu at {facebook, twitter, linkedin}
At 2012-05-13 03:29:51,"Kwan Lisa" <lisakwan1102@xxxxxxxxx> wrote:
>Hi,
>
>I'm using a corpus without manual phoneme alignment. Thus, I performed
>forced alignment to get the phoneme boundary information.
>However, the performance of the TTS system was not good. TTS system
>seems to be very sensitive to the accuracy of the phoneme boundary
>information.
>Is there any method that could improve the performance of TTS without
>manual phoneme alignment?
>
>--
>Lisa Kwan
>lisakwan1102(at)gmail.com
>