[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00284] Re:


From: "Ola Lars Haugen" <olalars@xxxxxxxxx>
Subject: [hts-users:00281] 
Date: Tue, 25 Apr 2006 19:17:45 +0200
Message-ID: <e199440b0604251017l75be9b50u9d65f48301577421@xxxxxxxxxxxxxx>

> In HTS you use mcep coefficients and not MFCC. What's the main difference
> between these parameters?

We have to synthesize speech from speech spectral parameters
(e.g., MFCC's) generated from HMMs, but there is no
mathematically-well-defined method for synthesizing speech from
MFCCs (though sinusoidal models could be used).  HTS is using
mel-cepstral coefficients which are extracted by a SPTK command
"mcep".  We can synthesize speech from the mel-cepstral
coefficients using the MLSA filter, whose coefficients are given
by the mel-cepstral coefficients.

See

  Toshiaki Fukada, Keiichi Tokuda, Takao Kobayashi and Satoshi
  Imai, ``An adaptive algorithm for mel-cepstral analysis of
  speech,'' Proceedings of IEEE International Conference on
  Acoustics, Speech, and Signal Processing, vol.1, pp.137-140,
  Mar. 1992.

  Keiichi Tokuda, Takao Kobayashi, Takashi Masuko and Satoshi
  Imai, ``Mel-generalized cepstral analysis ---a unified
  approach to speech spectral estimation,'' Proceedings of
  International Conference on Spoken Language Processing, vol.3,
  pp.1043--1046, Sep. 1994.

for the MLSA filter and the mel-cepstral analysis technique.  It
noted that we don't use the adaptive version of mel-cepstral
analysis in HTS.

You may find pdfs at

  http://kt-lab.ics.nitech.ac.jp/~tokuda/selected_pub/

Keiichi Tokuda
tokuda@xxxxxxxxxxxxxxxx
http://kt-lab.ics.nitech.ac.jp/~tokuda/

References
[hts-users:00281], Ola Lars Haugen