[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00201] Re: emotional speech synthesis using hts

Subject: [hts-users:00201] Re: emotional speech synthesis using hts
From: 刘磊 <liulei_198216@xxxxxxxxxxx>
Date: Fri, 24 Feb 2006 08:26:37 +0800

Hi,Heiga ZEN

Thanks for your help.

I have read some paper about "PSOLA",

I find that speech synthesis with "PSOLA" needs "text analyzer" and"prosody model".The "prosody model" can forecast prosody(F0, duration) form text that willbe synthesized.


We all HTS uses festival as   "text analyzer",
does festival have the function of forecasting prosody.

HTS uses MSD-HMM to model F0.If festival can forecast prosody(F0),

How do they work together.

thank youliulei2006.2.24

From: "Heiga ZEN (Byung Ha CHUN)" <zen@xxxxxxxxxxxxxxxx>
Reply-To: hts-users@xxxxxxxxxxxxxxxxxxxxxxxxx
To: hts-users@xxxxxxxxxxxxxxxxxxxxxxxxx
Subject: [hts-users:00200] Re: emotional speech synthesis using hts
Date: Wed, 22 Feb 2006 23:11:47 +0900

Hi,

liulei_198216@xxxxxxxxxxx wrote:
I have read some papers about emotional speech synthesis, and now,Iknow that hts uses "model interpolation" and "speaker adaptation"to synthesize motional speech and speech with various styles.
Yes.
About "speaker adaptation" , it refers that for synthesizingspeech with various styles, we must convert speech featureincluding spectrum, F0,duration.
Actually wee do not need to convert speech features themselves, weneed to convert "statistics" (model parameters) of these features.
But when I want to synthesize emotional speech, is it necessary toconvert spectrum.Is it enough that getting emotional speech through converting F0,duration.
In my opinion, converting spectrum will help to synthesize emotionalspeech.
In addition, how can I get a real-time emotional convertion, forexample from sad to happy.Certainly we can use "model interpolation" and "speakeradaptation", but they need time in the training part.
For speaker interpolation you have to prepare a number of modelsusing sufficient emotional speech samples.
Recording speech and training HMMs may take some time.
On the other hand, for speaker adaptation you only need one set ofHMMs trained using neutral speech and a few emotional speech samplesfor adaptation.Speaker (emotion) adaptation can be done off-line, so synthesizingemotional speech does not require any additional time.
You can also use adapted models for interpolation.
does anyone know "speech synthesis driven by emotional function " ?
I don't know what "emotional function" is.
In the HMM-based speech synthesis system with MLLR-based speaker(emotion) adaptation, I think linear transformation matrices formean and variances of the HMMs can be viewed as the functions torepresent the relationships between neutral and emotional speech.
Best regards,

Heiga Zen (Byung Ha Chun)

--
 ------------------------------------------------
  Heiga ZEN     (in Japanese pronunciation)
  Byung-Ha CHUN (in Korean pronunciation)

  Department of Computer Science and Engineering
  Graduate School of Engineering
  Nagoya Institute of Technology
  Japan

  e-mail: zen@xxxxxxxxxxxxxxxx
     web: http://kt-lab.ics.nitech.ac.jp/~zen
 ------------------------------------------------


_________________________________________________________________

与联机的朋友进行交流，请使用 MSN Messenger: http://messenger.msn.com/cn

References
: [hts-users:00200] Re: emotional speech synthesis using hts, Heiga ZEN (Byung Ha CHUN)

Prev by Subject: [hts-users:00200] Re: emotional speech synthesis using hts
Next by Subject: [hts-users:00202] 答复: [hts-users:00201] Re: emotional speech synthesis using hts
Previous by thread: [hts-users:00200] Re: emotional speech synthesis using hts
Next by thread: [hts-users:00202] 答复: [hts-users:00201] Re: emotional speech synthesis using hts