[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:02982] Re: diagonalization of covariance matrices

Subject: [hts-users:02982] Re: diagonalization of covariance matrices
From: Xi Wang <vancycn@xxxxxxxxx>
Date: Thu, 4 Aug 2011 13:32:01 +0800
Cc: "hts-users@xxxxxxxxxxxxxxx" <hts-users@xxxxxxxxxxxxxxx>
Delivered-to: hts-users@xxxxxxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=/rsJCq5FL4tHzZ/f2JyIm8uK3LVZ9Xt8B30lpPMwZ5Q=; b=TUlHf4rFKY8X7z/Sx1dONrV0xHOhiFlDtppTCRU0m84RG2c4+tTQXhiygAl0H6ZhPh Ay3Dy3L1UDyt1trN1I0IZyWm6GuqcI3N6EBmldZyIjk1eFAzRebJrQ1QyCGWnVlsuYW9 YUcOLtAQ9d3ERlrk83TAbe2Chx2OQuhZiu3xc=

I know there are some similar experiments on speech recognition, but not much on speech synthesis.

In my knowledge, Mark Gales has done an on-going research about the covariance types of MLLR transforms in robust noise speech recognition. In his early article in ICSLP 1996 “Variance Compensation within the MLLR Framework for Robust speech recognition and speaker adaptation”, full, block diagonal, and diagonal transformation matrices were examined. Experimental result showed that block diagonal and full matrices had slight improvement over the diagonal case. However, when the environmental mismatch was bigger, the differences became smaller. Furthermore, the semi-tied full covariance matrix was proposed next to be certificated effective.

You can also refer to Peder A. Olesen’s paper of “Modeling Inverse Covariance Matrices by Basis Expansion”. He did a more general experiment to prove his new covariance modeling technique.

It seems the diagonal matrices are not accurate enough for compensation the transformation to adapt the new speaker or new acoustic environment. I think this may be similar case in speech synthesis.

Regards,
Xi

On Wed, Aug 3, 2011 at 9:08 PM, Hui LIANG <tshlmail-hts@xxxxxxxxx> wrote:

Hello,

Could anyone confirm that when converting a model adapted by CMLLR transforms into an HTS engine, the resulting speaker-specific, full covariance matrices are diagonalized by HTS?

If so, I wonder whether there is any paper comparing the performance of synthesis with the original full covariance matrices and diagonalized ones? I am curious about the performance gap between the two cases.

Thank you very much in advance!

Best regards,
Hui LIANG

--
Best regards!

Xi Wang (汪曦)

References
: [hts-users:02980] diagonalization of covariance matrices, Hui LIANG

Prev by Subject: [hts-users:02981] Re: diagonalization of covariance matrices
Next by Subject: [hts-users:02983] Registration for Interspeech2011 Student event will be closed in August, 10th
Previous by thread: [hts-users:02981] Re: diagonalization of covariance matrices
Next by thread: [hts-users:02983] Registration for Interspeech2011 Student event will be closed in August, 10th