[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00200] Re: emotional speech synthesis using hts

Subject: [hts-users:00200] Re: emotional speech synthesis using hts
From: "Heiga ZEN (Byung Ha CHUN)" <zen@xxxxxxxxxxxxxxxx>
Date: Wed, 22 Feb 2006 23:11:47 +0900

Hi,

liulei_198216@xxxxxxxxxxx wrote:

I have read some papers about emotional speech synthesis, and now,I knowthat hts uses "model interpolation" and "speaker adaptation" tosynthesize motional speech and speech with various styles.


Yes.

About "speaker adaptation" , it refers that for synthesizing speechwith various styles, we must convert speech feature including spectrum,F0,duration.

Actually wee do not need to convert speech features themselves, we need to convert "statistics" (model parameters) ofthese features.

But when I want to synthesize emotional speech, is it necessary toconvert spectrum.
Is it enough that getting emotional speech through converting F0 ,duration.


In my opinion, converting spectrum will help to synthesize emotional speech.

In addition, how can I get a real-time emotional convertion, for examplefrom sad to happy.Certainly we can use "model interpolation" and "speaker adaptation",but they need time in the training part.


For speaker interpolation you have to prepare a number of models using sufficient emotional speech samples.
Recording speech and training HMMs may take some time.

On the other hand, for speaker adaptation you only need one set of HMMs trained using neutral speech and a few emotionalspeech samples for adaptation.

Speaker (emotion) adaptation can be done off-line, so synthesizing emotional speech does not require any additional time.
You can also use adapted models for interpolation.

does anyone know "speech synthesis driven by emotional function " ?


I don't know what "emotional function" is.

In the HMM-based speech synthesis system with MLLR-based speaker (emotion) adaptation, I think linear transformationmatrices for mean and variances of the HMMs can be viewed as the functions to represent the relationships betweenneutral and emotional speech.


Best regards,

Heiga Zen (Byung Ha Chun)

--
 ------------------------------------------------
  Heiga ZEN     (in Japanese pronunciation)
  Byung-Ha CHUN (in Korean pronunciation)

  Department of Computer Science and Engineering
  Graduate School of Engineering
  Nagoya Institute of Technology
  Japan

  e-mail: zen@xxxxxxxxxxxxxxxx
     web: http://kt-lab.ics.nitech.ac.jp/~zen
 ------------------------------------------------

Follow-Ups
: [hts-users:00201] Re: emotional speech synthesis using hts, 刘磊

References
: [hts-users:00199] emotional speech synthesis using hts, 刘磊

Prev by Subject: [hts-users:00199] emotional speech synthesis using hts
Next by Subject: [hts-users:00201] Re: emotional speech synthesis using hts
Previous by thread: [hts-users:00199] emotional speech synthesis using hts
Next by thread: [hts-users:00201] Re: emotional speech synthesis using hts