[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:02020] Re: Poor voice quality & embedded reestimation (monophone) errors


I suggest you to cut some files with bad log likelihood.

Build a speech recognizer system and look for files with log likelihood so high or so low. HERest (normal HTK)
can do this for you.
If there are files with log likelihoods bigger or smaller than mean+-(1.5*Standard deviation) for all steps
in your training set, cut them and retrain system.

Sometimes labeling is ok and doesn't have any problem with audio, but for some reason some files simple
don't match with majority of training set and you can have problems with them to train a system (synthesis
or recognition).

HERest has some problems when number of training files is high (or size of training audio) when a single machine is used.
When this happens, probability of transition matrix starts to be different of 1 and breaks training.

I don't believe that is your case since usually happens with more than 100 hours of speech training.
One time this happens for a noisy training set (very few hours of training).
In this case, divide your training in multiple machines (HERest) and combine HMMs resulting.
This will solve the problem.



Luis Felipe Uebel



View your Twitter and Flickr updates from one place - Learn more!

References
[hts-users:02015] Poor voice quality & embedded reestimation (monophone) errors, Tomasz Kuczmarski
[hts-users:02016] Re: Poor voice quality & embedded reestimation (monophone) errors, Tomasz Kuczmarski
[hts-users:02017] Re: Poor voice quality & embedded reestimation (monophone) errors, Simon King
[hts-users:02018] Re: Poor voice quality & embedded reestimation (monophone) errors, Tomasz Kuczmarski