I compared HTS with HTK for speech recognition purpose.
Our HTS seems to work better than (at least comparable to) HTK in the
recognition as well.
These are the comparison results. For details of the condition, please
see the below.
Single Mixture Cross-word State-Clustered Triphone HMMs
MFCC(39)
HTK: WORD: %Corr=86.97, Acc=80.03 [H=8947, D=215, S=1126, I=714, N=10288]
HTS: WORD: %Corr=87.90, Acc=81.81 [H=9043, D=208, S=1037, I=626, N=10288]
PLP (39)
HTK: WORD: %Corr=93.83, Acc=92.16 [H=9653, D=165, S=470, I=172, N=10288]
HTS: WORD: %Corr=93.86, Acc=92.20 [H=9656, D=169, S=463, I=170, N=10288]
6 Mixtures Cross-word State-Clustered Triphone HMMs
MFCC(39)
HTK: WORD: %Corr=93.96, Acc=90.41 [H=9667, D=88, S=533, I=366, N=10288]
HTS: WORD: %Corr=93.97, Acc=90.73 [H=9668, D=87, S=533, I=334, N=10288]
PLP(39)
HTK: WORD: %Corr=96.42, Acc=95.39 [H=9920, D=75, S=293, I=106, N=10288]
HTS: WORD: %Corr=96.58, Acc=95.47 [H=9936, D=74, S=278, I=114, N=10288]