[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:02982] Re: diagonalization of covariance matrices


Hi

 

I know there are some similar experiments on speech recognition, but not much on speech synthesis.

In my knowledge, Mark Gales has done an on-going research about the covariance types of MLLR transforms in robust noise speech recognition. In his early article in ICSLP 1996 “Variance Compensation within the MLLR Framework for Robust speech recognition and speaker adaptation”, full, block diagonal, and diagonal transformation matrices were examined. Experimental result showed that block diagonal and full matrices had slight improvement over the diagonal case. However, when the environmental mismatch was bigger, the differences became smaller. Furthermore, the semi-tied full covariance matrix was proposed next to be certificated effective.

You can also refer to Peder A. Olesen’s paper of “Modeling Inverse Covariance Matrices by Basis Expansion”. He did a more general experiment to prove his new covariance modeling technique.

It seems the diagonal matrices are not accurate enough for compensation the transformation to adapt the new speaker or new acoustic environment. I think this may be similar case in speech synthesis.


Regards,
Xi

On Wed, Aug 3, 2011 at 9:08 PM, Hui LIANG <tshlmail-hts@xxxxxxxxx> wrote:
Hello,

Could anyone confirm that when converting a model adapted by CMLLR transforms into an HTS engine, the resulting speaker-specific, full covariance matrices are diagonalized by HTS?

If so, I wonder whether there is any paper comparing the performance of synthesis with the original full covariance matrices and diagonalized ones? I am curious about the performance gap between the two cases.

Thank you very much in advance!

Best regards,
Hui LIANG




--
Best regards!

Xi Wang (汪曦)


References
[hts-users:02980] diagonalization of covariance matrices, Hui LIANG