[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00650] Re: F0 contours become flat


Hi,

I also have the same results as Dr. Toda's comment.

In my objective experiments, the generation algorithm considering GV
caused larger mel-cepstral distance and RMSE(root mean square error)
of logF0 compared to the standard algorithm.

However, on the other hand, in my subjective evaluations,
synthetic speech using the generation algorithm considering GV is
judged to be significantly better than that using the standard one,
and be more similar to real speech of the speaker.

This implies that mel-cepstral distance or the RMSE of logF0 are
NOT perfect criteria.

Junichi Yamagishi
The Centre for Speech Technology Research
University of Edinburgh


On 2007/04/19, at 7:13, tomoki@xxxxxxxxxxx wrote:

Hi,

Perhaps measuring the RMSE in the F0 could show if the predicted contour with GV minimizes the error in comparison with the conventional method.

Note that the generation algorithm considering GV usually causes LARGER errors compared with the conventional one. This tendency is observed in
both mel-cep and F0 generations.

The HMM likelihood for a parameter trajectory generated by the conventional algorithm is too large compared with that for a natural one. This implies that we don't have to maximize only the HMM likelihood. This point is described in th
following papers:

   T. Toda, A.W. Black, and K. Tokuda, ICASSP2005.
   T. Toda and K. Tokuda, IEICE, May 2007 (to appear).

Note that the ICASSP paper describes only the mel-cep generation in voice conversion but we can see the similar result on the mel-cep and F0 generations
in HTS as well. Those HTS-results are described in the IEICE paper.

Thanks,
Tomoki Toda
  Nara Institute of Science and Technology
  E-mail: tomoki@xxxxxxxxxxx
  TEL: +81-743-72-5282
  FAX: +81-743-72-5289
----- Original Message -----
From: "Xavi Gonzalvo" <gonzalvo@xxxxxxxxxxxxx>
To: <hts-users@xxxxxxxxxxxxxxxxxxxxxxxxx>
Sent: Tuesday, April 17, 2007 6:27 PM
Subject: [hts-users:00645] Re: F0 contours become flat


Hi,

I've been reading the GV information as explained in your article
Interspeech 2005 and I notice that the MOS improves when GV is used in mel-cepstral, but perceptual tests don't show an improvement when applied to
F0 (as explained in the paper, perhaps caused by some data errors).

Perhaps measuring the RMSE in the F0 could show if the predicted contour with GV minimizes the error in comparison with the conventional method.

Greetings

-----Mensaje original-----
De: Heiga ZEN (Byung Ha CHUN) [mailto:zen@xxxxxxxxxxxxxxxx]
Enviado el: martes, 13 de marzo de 2007 23:49
Para: hts-users@xxxxxxxxxxxxxxxxxxxxxxxxx
Asunto: [hts-users:00577] Re: F0 contours become flat

Hi,

Xavi Gonzalvo wrote:

Let me add to this that we carried out some tests with Spanish Language comparing the f0 contour from HTS 1.1.1 and the one obtained from a CBR (Case Base Reasoning) algorithm. HMM contour were absolutely flatter than
the CBR one, especially when phrases were interrogative and short.

In our internal version we use GV.
It can reduce this problem.

Regards,

Heiga ZEN (Byung Ha CHUN)

--
------------------------------------------------
Heiga ZEN     (in Japanese pronunciation)
Byung Ha CHUN (in Korean pronunciation)

Department of Computer Science and Engineering
Nagoya Institute of Technology
Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan

http://www.sp.nitech.ac.jp/~zen
------------------------------------------------




Follow-Ups
[hts-users:00651] Re: F0 contours become flat, Xavi Gonzalvo
References
[hts-users:00574] F0 contours become flat, Heiga ZEN (Byung Ha CHUN)
[hts-users:00575] Re: F0 contours become flat, Xavi Gonzalvo
[hts-users:00577] Re: F0 contours become flat, Heiga ZEN (Byung Ha CHUN)
[hts-users:00645] Re: F0 contours become flat, Xavi Gonzalvo
[hts-users:00649] Re: F0 contours become flat, tomoki