[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:01213] WER Comparison of HTK and HTS


Hi all,

I compared HTS with HTK for speech recognition purpose.
Our HTS seems to work better than (at least comparable to) HTK in the recognition as well. These are the comparison results. For details of the condition, please see the below.

Single Mixture Cross-word State-Clustered Triphone HMMs
MFCC(39)
HTK: WORD: %Corr=86.97, Acc=80.03 [H=8947, D=215, S=1126, I=714, N=10288] HTS: WORD: %Corr=87.90, Acc=81.81 [H=9043, D=208, S=1037, I=626, N=10288]

PLP (39)
HTK: WORD: %Corr=93.83, Acc=92.16 [H=9653, D=165, S=470, I=172, N=10288]
HTS: WORD: %Corr=93.86, Acc=92.20 [H=9656, D=169, S=463, I=170, N=10288]

6 Mixtures Cross-word State-Clustered Triphone HMMs
MFCC(39)
HTK: WORD: %Corr=93.96, Acc=90.41 [H=9667, D=88, S=533, I=366, N=10288]
HTS: WORD: %Corr=93.97, Acc=90.73 [H=9668, D=87, S=533, I=334, N=10288]

PLP(39)
HTK: WORD: %Corr=96.42, Acc=95.39 [H=9920, D=75, S=293, I=106, N=10288]
HTS: WORD: %Corr=96.58, Acc=95.47 [H=9936, D=74, S=278, I=114, N=10288]


-------------- Reference (HTS does not support MPE perfectly now)------
6 Mixture Cross-word State-Clustered Triphone HMMs (MPE)
MFCC(39)
HTK: WORD: %Corr=93.83, Acc=90.51 [H=9653, D=90, S=545, I=341, N=10288]
HTS: WORD: %Corr=93.69, Acc=90.66 [H=9639, D=95, S=554, I=312, N=10288]

PLP(39)
HTK: WORD: %Corr=96.56, Acc=95.56 [H=9934, D=71, S=283, I=103, N=10288]
HTS: WORD: %Corr=96.35, Acc=95.36 [H=9912, D=78, S=298, I=101, N=10288]
------------------------------------------------------------------------

Its condition was as follows:
Training Data: Resource Management (int_trn109: see RMHTK for details)
Test data:  feb89 feb91 oct89 sep92
Analysis:
MFCC
SOURCEKIND     = WAVEFORM
SOURCEFORMAT   = WAV
SOURCERATE     = 625
ZMEANSOURCE    = FALSE
TARGETKIND     = MFCC_E
TARGETRATE     = 100000
WINDOWSIZE     = 250000.0
PREEMCOEF      = 0.97
#USEPOWER       = FALSE
NUMCHANS       = 24
LPCORDER       = 12
CEPLIFTER      = 22
NUMCEPS        = 12
ENORMALISE     = TRUE
ESCALE         = 1.0
DELTAWINDOW    = 2
ACCWINDOW      = 2

PLP
SOURCEKIND     = WAVEFORM
SOURCEFORMAT   = WAV
SOURCERATE     = 625
ZMEANSOURCE    = FALSE
TARGETKIND     = PLP_0
TARGETRATE     = 100000
WINDOWSIZE     = 250000.0
PREEMCOEF      = 0.97
USEPOWER        = TRUE
NUMCHANS       = 24
LPCORDER       = 12
CEPLIFTER      = 22
NUMCEPS        = 12
ENORMALISE     = TRUE
ESCALE         = 1.0
DELTAWINDOW    = 2
ACCWINDOW      = 2

Feature: MFCC_E_D_A, PLP_0_D_A (static + delta + delta^2)
HMM: 3-state left-to-right speaker-independent HMM without skip paths
Training Procedures & Model Topology: Same
Number of model parameters: Same
Comparison way of HTK and HTS: Replacement of binary files

Junichi Yamagishi
CSTR


Follow-Ups
[hts-users:01214] Re: WER Comparison of HTK and HTS, Heiga ZEN (Byung Ha CHUN)
[hts-users:01218] Re: WER Comparison of HTK and HTS, Sacha Krstulovic