[hts-users:00284] Re:
From: "Ola Lars Haugen" <olalars@xxxxxxxxx>
Subject: [hts-users:00281]
Date: Tue, 25 Apr 2006 19:17:45 +0200
Message-ID: <e199440b0604251017l75be9b50u9d65f48301577421@xxxxxxxxxxxxxx>
> In HTS you use mcep coefficients and not MFCC. What's the main difference
> between these parameters?
We have to synthesize speech from speech spectral parameters
(e.g., MFCC's) generated from HMMs, but there is no
mathematically-well-defined method for synthesizing speech from
MFCCs (though sinusoidal models could be used). HTS is using
mel-cepstral coefficients which are extracted by a SPTK command
"mcep". We can synthesize speech from the mel-cepstral
coefficients using the MLSA filter, whose coefficients are given
by the mel-cepstral coefficients.
See
Toshiaki Fukada, Keiichi Tokuda, Takao Kobayashi and Satoshi
Imai, ``An adaptive algorithm for mel-cepstral analysis of
speech,'' Proceedings of IEEE International Conference on
Acoustics, Speech, and Signal Processing, vol.1, pp.137-140,
Mar. 1992.
Keiichi Tokuda, Takao Kobayashi, Takashi Masuko and Satoshi
Imai, ``Mel-generalized cepstral analysis ---a unified
approach to speech spectral estimation,'' Proceedings of
International Conference on Spoken Language Processing, vol.3,
pp.1043--1046, Sep. 1994.
for the MLSA filter and the mel-cepstral analysis technique. It
noted that we don't use the adaptive version of mel-cepstral
analysis in HTS.
You may find pdfs at
http://kt-lab.ics.nitech.ac.jp/~tokuda/selected_pub/
Keiichi Tokuda
tokuda@xxxxxxxxxxxxxxxx
http://kt-lab.ics.nitech.ac.jp/~tokuda/
- References
-
- [hts-users:00281], Ola Lars Haugen