[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00651] Re: F0 contours become flat


Thank you very much for your explanations.

I also believe RMSE is not the best way in order to objectively analyze the
generated parameters, but it was the first approach we decided to apply in
our study together with perceptual tests.

Somehow or other I think that if the synthesis units are trained in an HMM
and clustered all together, they cannot reliable represent the real
parameters (with minimum error) though they can produce an excellent and
natural speech. I guess this could have a relation to what you are replying
me.

I will continue testing on these topics and again, thanks a lot for your
time. This has been really useful for me.

Regards


-----Mensaje original-----
De: Junichi Yamagishi [mailto:v1jyamag@xxxxxxxxxxxx] 
Enviado el: jueves, 19 de abril de 2007 22:57
Para: hts-users@xxxxxxxxxxxxxxx
CC: Junichi Yamagishi
Asunto: [hts-users:00650] Re: F0 contours become flat

Hi,

I also have the same results as Dr. Toda's comment.

In my objective experiments, the generation algorithm considering GV
caused larger mel-cepstral distance and RMSE(root mean square error)
of logF0 compared to the standard algorithm.

However, on the other hand, in my subjective evaluations,
synthetic speech using the generation algorithm considering GV is
judged to be significantly better than that using the standard one,
and be more similar to real speech of the speaker.

This implies that mel-cepstral distance or the RMSE of logF0 are
NOT perfect criteria.

Junichi Yamagishi
The Centre for Speech Technology Research
University of Edinburgh


On 2007/04/19, at 7:13, tomoki@xxxxxxxxxxx wrote:

> Hi,
>
>> Perhaps measuring the RMSE in the F0 could show if the predicted  
>> contour
>> with GV minimizes the error in comparison with the conventional  
>> method.
>
> Note that the generation algorithm considering GV usually causes  
> LARGER
> errors compared with the conventional one. This tendency is  
> observed in
> both mel-cep and F0 generations.
>
> The HMM likelihood for a parameter trajectory generated by the  
> conventional
> algorithm is too large compared with that for a natural one. This  
> implies that we
> don't have to maximize only the HMM likelihood. This point is  
> described in th
> following papers:
>
>    T. Toda, A.W. Black, and K. Tokuda, ICASSP2005.
>    T. Toda and K. Tokuda, IEICE, May 2007 (to appear).
>
> Note that the ICASSP paper describes only the mel-cep generation in  
> voice
> conversion but we can see the similar result on the mel-cep and F0  
> generations
> in HTS as well. Those HTS-results are described in the IEICE paper.
>
> Thanks,
> Tomoki Toda
>   Nara Institute of Science and Technology
>   E-mail: tomoki@xxxxxxxxxxx
>   TEL: +81-743-72-5282
>   FAX: +81-743-72-5289
> ----- Original Message -----
> From: "Xavi Gonzalvo" <gonzalvo@xxxxxxxxxxxxx>
> To: <hts-users@xxxxxxxxxxxxxxxxxxxxxxxxx>
> Sent: Tuesday, April 17, 2007 6:27 PM
> Subject: [hts-users:00645] Re: F0 contours become flat
>
>
>> Hi,
>>
>> I've been reading the GV information as explained in your article
>> Interspeech 2005 and I notice that the MOS improves when GV is  
>> used in
>> mel-cepstral, but perceptual tests don't show an improvement when  
>> applied to
>> F0 (as explained in the paper, perhaps caused by some data errors).
>>
>> Perhaps measuring the RMSE in the F0 could show if the predicted  
>> contour
>> with GV minimizes the error in comparison with the conventional  
>> method.
>>
>> Greetings
>>
>> -----Mensaje original-----
>> De: Heiga ZEN (Byung Ha CHUN) [mailto:zen@xxxxxxxxxxxxxxxx]
>> Enviado el: martes, 13 de marzo de 2007 23:49
>> Para: hts-users@xxxxxxxxxxxxxxxxxxxxxxxxx
>> Asunto: [hts-users:00577] Re: F0 contours become flat
>>
>> Hi,
>>
>> Xavi Gonzalvo wrote:
>>
>>> Let me add to this that we carried out some tests with Spanish  
>>> Language
>>> comparing the f0 contour from HTS 1.1.1 and the one obtained from  
>>> a CBR
>>> (Case Base Reasoning) algorithm. HMM contour were absolutely  
>>> flatter than
>>> the CBR one, especially when phrases were interrogative and short.
>>
>> In our internal version we use GV.
>> It can reduce this problem.
>>
>> Regards,
>>
>> Heiga ZEN (Byung Ha CHUN)
>>
>> -- 
>> ------------------------------------------------
>> Heiga ZEN     (in Japanese pronunciation)
>> Byung Ha CHUN (in Korean pronunciation)
>>
>> Department of Computer Science and Engineering
>> Nagoya Institute of Technology
>> Gokiso-cho, Showa-ku, Nagoya 466-8555 Japan
>>
>> http://www.sp.nitech.ac.jp/~zen
>> ------------------------------------------------
>
>



References
[hts-users:00650] Re: F0 contours become flat, Junichi Yamagishi