[hts-users:03300] Re: How to improve performance of TTS without manual p

[hts-users:03300] Re: How to improve performance of TTS without manual phoneme alignment

Subject: [hts-users:03300] Re: How to improve performance of TTS without manual phoneme alignment

From: huanliangwang <huanliangwang@xxxxxxx>

Date: Sun, 13 May 2012 15:21:19 +0800 (CST)

Delivered-to: hts-users@xxxxxxxxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Received:Date:From:To:Subject:In-Reply-To: References:Content-Type:MIME-Version:Message-ID; bh=hPl7sRtpQiEk POP4UNz85hc1vCn+tOYdW40GP17oZ3E=; b=KyYZmvJNzZyg9Gg4TN17CQRpkBo1 Js34QjWZJcEMb5mBFjPyOb88YSDfAJRBbcCkBifrS7tm9ge+JH0WEGOIHb9GHRyd Au+JtsBnzz3rHFcjqK0s5AtEYAiqNNQP0N4Sybbzow8XiOayEuWK9xZ+XE+cffnJ aZjbOlauf+CIKX8=

Hi,
Our experiments show the same conclusion as you. The effect of initial phone boundary to the synthesis results is very limited if you use the embedded training procedure. I think the consistency between mono phone sequence and full-context phone sequence, as well as in training stage and in synthesis stage, may be more important for synthesis performance.

Best Regards,

hlwang

At 2012-05-13 10:05:26,"那兴宇" <nxy-yzqs@xxxxxxx> wrote:

My opinion:
HTS system use phoneme boundaries only in the initialization stage of monophone models, which will affect the result of convergences in the embedded training of full-context models. But in my experiments, phoneme alignment does not affect that much. So I do not know what is your corpus size, maybe using more training data would help.
--

Xingyu Na (那兴宇)

Beijing Institute of Technology

naxy(at)bit.edu.cn

asr.naxingyu(at)gmail.com

naxingyu at {facebook, twitter, linkedin}
At 2012-05-13 03:29:51,"Kwan Lisa" <lisakwan1102@xxxxxxxxx> wrote:
>Hi,
>
>I'm using a corpus without manual phoneme alignment. Thus, I performed
>forced alignment to get the phoneme boundary information.
>However, the performance of the TTS system was not good. TTS system
>seems to be very sensitive to the accuracy of the phoneme boundary
>information.
>Is there any method that could improve the performance of TTS without
>manual phoneme alignment?
>
>--
>Lisa Kwan
>lisakwan1102(at)gmail.com
>