[hts-users:00200] Re: emotional speech synthesis using hts
- Subject: [hts-users:00200] Re: emotional speech synthesis using hts
- From: "Heiga ZEN (Byung Ha CHUN)" <zen@xxxxxxxxxxxxxxxx>
- Date: Wed, 22 Feb 2006 23:11:47 +0900
Hi,
liulei_198216@xxxxxxxxxxx wrote:
I have read some papers about emotional speech synthesis, and now,I know
that hts uses "model interpolation" and "speaker adaptation" to
synthesize motional speech and speech with various styles.
Yes.
About "speaker adaptation" , it refers that for synthesizing speech
with various styles, we must convert speech feature including spectrum,
F0,duration.
Actually wee do not need to convert speech features themselves, we need to convert "statistics" (model parameters) of
these features.
But when I want to synthesize emotional speech, is it necessary to
convert spectrum.
Is it enough that getting emotional speech through converting F0 ,duration.
In my opinion, converting spectrum will help to synthesize emotional speech.
In addition, how can I get a real-time emotional convertion, for example
from sad to happy.
Certainly we can use "model interpolation" and "speaker adaptation",
but they need time in the training part.
For speaker interpolation you have to prepare a number of models using sufficient emotional speech samples.
Recording speech and training HMMs may take some time.
On the other hand, for speaker adaptation you only need one set of HMMs trained using neutral speech and a few emotional
speech samples for adaptation.
Speaker (emotion) adaptation can be done off-line, so synthesizing emotional speech does not require any additional time.
You can also use adapted models for interpolation.
does anyone know "speech synthesis driven by emotional function " ?
I don't know what "emotional function" is.
In the HMM-based speech synthesis system with MLLR-based speaker (emotion) adaptation, I think linear transformation
matrices for mean and variances of the HMMs can be viewed as the functions to represent the relationships between
neutral and emotional speech.
Best regards,
Heiga Zen (Byung Ha Chun)
--
------------------------------------------------
Heiga ZEN (in Japanese pronunciation)
Byung-Ha CHUN (in Korean pronunciation)
Department of Computer Science and Engineering
Graduate School of Engineering
Nagoya Institute of Technology
Japan
e-mail: zen@xxxxxxxxxxxxxxxx
web: http://kt-lab.ics.nitech.ac.jp/~zen
------------------------------------------------
- Follow-Ups
-
- [hts-users:00201] Re: emotional speech synthesis using hts, 刘 磊
- References
-
- [hts-users:00199] emotional speech synthesis using hts, 刘 磊