[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04134] Re: Dealing with "metallic" sounding of some consonants in HTS

Subject: [hts-users:04134] Re: Dealing with "metallic" sounding of some consonants in HTS
From: Nickolay V.Shmyrev <nshmyrev@xxxxxxxxx>
Date: Thu, 02 Oct 2014 22:04:19 +0400
Delivered-to: hts-users@xxxxxxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex.ru; s=mail; t=1412273059; bh=SOwkfyH12ehAQKOG8hLEXAG3yD/sL42E3hHgaGlDcv0=; h=From:To:In-Reply-To:References:Subject:Date; b=H3LOraWtNxIzpz2TVtbI8ZS/wi5iWjx4zrRUTtJzoV5ma/xh/R53HRaxduQIni12D 3pWh/lEXfxqH2T6xn6u1wvemKQdLEps2UlGmnz6CB36+FdV9CX/o/HDj49VxPj0Xsj Vhnft2FV+aslUFx/wTPAdTPaH5suOvOtE2zj4ucQ=

02.10.2014, 00:27, "Ilya Edrenkin" <ilia@xxxxxxxxxxxxxxx>:

Hi,

Has anyone faced a problem of "metallic" or "ringing" sounding of particular consonants in HTS, such as "z" and "th"?

Here are a couple of examples attached: one English "this is the zombie" from cmu-slt voice and one Russian with text "zoya zabrala zebru". In both cases "z" sounds a bit strange in almost the same way, although the training databases are disjoint and even the training systems are different (English is taken from MaryTTS, Russian is built on hts-2.3alpha).

Training data itself does not contain such an effect for "z" phones. Postfiltering or altering GV weights doesn't seem to help. Setting MSD threshold as high as 0.95 does help: there is no more "metallic" sounding for "z", but vowels are of course distorted into "whispering".

It seems that the main problem is in the frequency band 2.5KHz-6KHz. It is visible at the spectrogram (attached; positions 0.2, 0.5, 1.0). Applying a steep bandreject filter (almost removing this band) does help. I wonder if there is a cleaner way to deal with it? Probably tuning the MLSA filter or playing with mel-cepstral feature extraction parameters could help?

Thank you for any advice!

Hello Ilya

Wile the problem of sound quality exists, I think that the issue with your sample is a bit different from the one you describing.

English sample is actually ok since it was trained more or less properly. z though a bit fuzzy it's properly voiced. As for Russian sample, I see few issues in it.

First of all z sound is originally voiced and in many cases it is reduced to voiceless. In your case in "zoya" it is definitely voiced sound while if you look on the spectrogram you'll see that engine tries to make it voiceless. And synthesizer makes it inconsistently, the beginning of z is voiced (0.16-0.20) and the ending of it is voiceless (0.20-0.25). That unusual transition makes you think you hear an artifact like she is speaking with closed teeth.

I see other phonetic issues in your synthesis, for example "zabrala" is incorrectly rendered as "z a b r a l aa" while it should be closer to "z shwa b r a l aa" .

Russian is pretty flexible language and though many believe it's read as it is written its very hard to train it properly. For example, this "z"/"s" reduction must be properly accounted in a training database, and you need to debug all the issues in training database markup to get a properly synthesized sounds. Unlike ASR, synthesis doesn't allow such mistakes.

Another possible way to improve this situation is to increase the training data size.

Follow-Ups
: [hts-users:04135] Re: Dealing with "metallic" sounding of some consonants in HTS, Ilya Edrenkin

References
: [hts-users:04132] Dealing with "metallic" sounding of some consonants in HTS, Ilya Edrenkin

Prev by Subject: [hts-users:04133] Re: Dealing with "metallic" sounding of some consonants in HTS
Next by Subject: [hts-users:04135] Re: Dealing with "metallic" sounding of some consonants in HTS
Previous by thread: [hts-users:04133] Re: Dealing with "metallic" sounding of some consonants in HTS
Next by thread: [hts-users:04135] Re: Dealing with "metallic" sounding of some consonants in HTS