[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00200] Re: emotional speech synthesis using hts


Hi,

liulei_198216@xxxxxxxxxxx wrote:

I have read some papers about emotional speech synthesis, and now,I know that hts uses "model interpolation" and "speaker adaptation" to synthesize motional speech and speech with various styles.

Yes.

About "speaker adaptation" , it refers that for synthesizing speech with various styles, we must convert speech feature including spectrum, F0,duration.

Actually wee do not need to convert speech features themselves, we need to convert "statistics" (model parameters) of these features.

But when I want to synthesize emotional speech, is it necessary to convert spectrum.
Is it enough that getting emotional speech through converting F0 ,duration.

In my opinion, converting spectrum will help to synthesize emotional speech.

In addition, how can I get a real-time emotional convertion, for example from sad to happy. Certainly we can use "model interpolation" and "speaker adaptation", but they need time in the training part.

For speaker interpolation you have to prepare a number of models using sufficient emotional speech samples.
Recording speech and training HMMs may take some time.
On the other hand, for speaker adaptation you only need one set of HMMs trained using neutral speech and a few emotional speech samples for adaptation.
Speaker (emotion) adaptation can be done off-line, so synthesizing emotional speech does not require any additional time.
You can also use adapted models for interpolation.

does anyone know "speech synthesis driven by emotional function " ?

I don't know what "emotional function" is.
In the HMM-based speech synthesis system with MLLR-based speaker (emotion) adaptation, I think linear transformation matrices for mean and variances of the HMMs can be viewed as the functions to represent the relationships between neutral and emotional speech.

Best regards,

Heiga Zen (Byung Ha Chun)

--
 ------------------------------------------------
  Heiga ZEN     (in Japanese pronunciation)
  Byung-Ha CHUN (in Korean pronunciation)

  Department of Computer Science and Engineering
  Graduate School of Engineering
  Nagoya Institute of Technology
  Japan

  e-mail: zen@xxxxxxxxxxxxxxxx
     web: http://kt-lab.ics.nitech.ac.jp/~zen
 ------------------------------------------------


Follow-Ups
[hts-users:00201] Re: emotional speech synthesis using hts, 刘 磊
References
[hts-users:00199] emotional speech synthesis using hts, 刘 磊