Hi,
Has anyone faced a problem of "metallic" or "ringing" sounding of particular consonants in HTS, such as "z" and "th"?
Here are a couple of examples attached: one English "this is the zombie" from cmu-slt voice and one Russian with text "zoya zabrala zebru". In both cases "z" sounds a bit strange in almost the same way, although the training databases are disjoint and even the training systems are different (English is taken from MaryTTS, Russian is built on hts-2.3alpha).
Training data itself does not contain such an effect for "z" phones. Postfiltering or altering GV weights doesn't seem to help. Setting MSD threshold as high as 0.95 does help: there is no more "metallic" sounding for "z", but vowels are of course distorted into "whispering".
It seems that the main problem is in the frequency band 2.5KHz-6KHz. It is visible at the spectrogram (attached; positions 0.2, 0.5, 1.0). Applying a steep bandreject filter (almost removing this band) does help. I wonder if there is a cleaner way to deal with it? Probably tuning the MLSA filter or playing with mel-cepstral feature extraction parameters could help?
Thank you for any advice!
Regards,
Ilya