[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00765] Re: Questions on training and flat pitch pattern


Hi,

Lee Sillon wrote (2007/08/08 0:10):

1. If the training corpus is larger (for example,10,000 sentences), the training process will crash because of memory consuming(about 2GB). Could it be solved?

I guess tree-based clustering (HHEd) consumes huge memory for larger training data. You can use low-memory implementation of tree-based clustering by specifying -r option.

2. How many sentences are enough for training a good model? 1000 sentences are enough or not?

I think 1000 (1 hour?) is not enough.
In the last year's Blizzard Challenge 2006 we used 5 hours of speech, and this year we used 7 hours.

3. In some papers, POS is considered because it influence pitch pattern a lot. But in the demo, it is ignored. Is it one of reasons that cause flat pitch pattern?

I don't think so, because broad POS has more effect to F0 patterns than detailed POS.
In Japanese we checked the effect of broad and detailed POS and found that even broad POS had small effect to the final synthesized speech quality.
So current demo uses only broad POS (gPOS in Festival).

Best regards,

Heiga ZEN (Byung Ha CHUN)

--
------------------------------------------------
Heiga ZEN     (in Japanese pronunciation)
Byung Ha CHUN (in Korean pronunciation)

Department of Computer Science and Engineering
Nagoya Institute of Technology
Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan

http://www.sp.nitech.ac.jp/~zen
------------------------------------------------

Follow-Ups
[hts-users:00766] Re: Questions on training and flat pitch pattern, Junichi Yamagishi
[hts-users:00780] Inceasing Number of mixtrure componenets, Tamer Fares
References
[hts-users:00764] Questions on training and flat pitch pattern, Lee Sillon