[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04132] Dealing with "metallic" sounding of some consonants in HTS


Hi,
 
Has anyone faced a problem of "metallic" or "ringing" sounding of particular consonants in HTS, such as  "z" and "th"?
 
Here are a couple of examples attached: one English "this is the zombie" from cmu-slt voice and one Russian with text "zoya zabrala zebru". In both cases "z" sounds a bit strange in almost the same way, although the training databases are disjoint and even the training systems are different (English is taken from MaryTTS, Russian is built on hts-2.3alpha).
 
Training data itself does not contain such an effect for "z" phones. Postfiltering or altering GV weights doesn't seem to help. Setting MSD threshold as high as 0.95 does help: there is no more "metallic" sounding for "z", but vowels are of course distorted into "whispering".
 
It seems that the main problem is in the frequency band 2.5KHz-6KHz. It is visible at the spectrogram (attached; positions 0.2, 0.5, 1.0).  Applying a steep bandreject filter (almost removing this band) does help. I wonder if there is a cleaner way to deal with it? Probably tuning the MLSA filter or playing with mel-cepstral feature extraction parameters could help?
 
Thank you for any advice!
 
Regards,
Ilya

Attachment: russian_example_zoya-zabrala-zebru-spectrogram-60dB.png
Description: PNG image

Attachment: english_example_this-is-the-zombie-2.wav
Description: Wave audio

Attachment: russian_example_zoya-zabrala-zebru.wav
Description: Wave audio


Follow-Ups
[hts-users:04134] Re: Dealing with "metallic" sounding of some consonants in HTS, Nickolay V . Shmyrev