I suggest you to cut some files with bad log likelihood. Build a speech recognizer system and look for files with log likelihood so high or so low. HERest (normal HTK) can do this for you. If there are files with log likelihoods bigger or smaller than mean+-(1.5*Standard deviation) for all steps in your training set, cut them and retrain system. Sometimes labeling is ok and doesn't have any problem with audio, but for some reason some files simple don't match with majority of training set and you can have problems with them to train a system (synthesis or recognition). HERest has some problems when number of training files is high (or size of training audio) when a single machine is used. When this happens, probability of transition matrix starts to be different of 1 and breaks training. I don't believe that is your case since usually happens with more than 100 hours of speech training. One time this happens for a noisy training set (very few hours of training). In this case, divide your training in multiple machines (HERest) and combine HMMs resulting. This will solve the problem. Luis Felipe Uebel View your Twitter and Flickr updates from one place - Learn more! |