[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:01213] WER Comparison of HTK and HTS

Subject: [hts-users:01213] WER Comparison of HTK and HTS
From: Junichi Yamagishi <jyamagis@xxxxxxxxxxxx>
Date: Wed, 12 Mar 2008 09:06:02 +0000
Cc: Junichi Yamagishi <jyamagis@xxxxxxxxxxxx>
Delivered-to: hts-users@xxxxxxxxxxxxxxx

Hi all,

I compared HTS with HTK for speech recognition purpose.

Our HTS seems to work better than (at least comparable to) HTK in therecognition as well.These are the comparison results. For details of the condition,please see the below.


Single Mixture Cross-word State-Clustered Triphone HMMs
MFCC(39)

HTK: WORD: %Corr=86.97, Acc=80.03 [H=8947, D=215, S=1126, I=714,N=10288]HTS: WORD: %Corr=87.90, Acc=81.81 [H=9043, D=208, S=1037, I=626,N=10288]


PLP (39)
HTK: WORD: %Corr=93.83, Acc=92.16 [H=9653, D=165, S=470, I=172, N=10288]
HTS: WORD: %Corr=93.86, Acc=92.20 [H=9656, D=169, S=463, I=170, N=10288]

6 Mixtures Cross-word State-Clustered Triphone HMMs
MFCC(39)
HTK: WORD: %Corr=93.96, Acc=90.41 [H=9667, D=88, S=533, I=366, N=10288]
HTS: WORD: %Corr=93.97, Acc=90.73 [H=9668, D=87, S=533, I=334, N=10288]

PLP(39)
HTK: WORD: %Corr=96.42, Acc=95.39 [H=9920, D=75, S=293, I=106, N=10288]
HTS: WORD: %Corr=96.58, Acc=95.47 [H=9936, D=74, S=278, I=114, N=10288]


-------------- Reference (HTS does not support MPE perfectly now)------
6 Mixture Cross-word State-Clustered Triphone HMMs (MPE)
MFCC(39)
HTK: WORD: %Corr=93.83, Acc=90.51 [H=9653, D=90, S=545, I=341, N=10288]
HTS: WORD: %Corr=93.69, Acc=90.66 [H=9639, D=95, S=554, I=312, N=10288]

PLP(39)
HTK: WORD: %Corr=96.56, Acc=95.56 [H=9934, D=71, S=283, I=103, N=10288]
HTS: WORD: %Corr=96.35, Acc=95.36 [H=9912, D=78, S=298, I=101, N=10288]
------------------------------------------------------------------------

Its condition was as follows:
Training Data: Resource Management (int_trn109: see RMHTK for details)
Test data:  feb89 feb91 oct89 sep92
Analysis:
MFCC
SOURCEKIND     = WAVEFORM
SOURCEFORMAT   = WAV
SOURCERATE     = 625
ZMEANSOURCE    = FALSE
TARGETKIND     = MFCC_E
TARGETRATE     = 100000
WINDOWSIZE     = 250000.0
PREEMCOEF      = 0.97
#USEPOWER       = FALSE
NUMCHANS       = 24
LPCORDER       = 12
CEPLIFTER      = 22
NUMCEPS        = 12
ENORMALISE     = TRUE
ESCALE         = 1.0
DELTAWINDOW    = 2
ACCWINDOW      = 2

PLP
SOURCEKIND     = WAVEFORM
SOURCEFORMAT   = WAV
SOURCERATE     = 625
ZMEANSOURCE    = FALSE
TARGETKIND     = PLP_0
TARGETRATE     = 100000
WINDOWSIZE     = 250000.0
PREEMCOEF      = 0.97
USEPOWER        = TRUE
NUMCHANS       = 24
LPCORDER       = 12
CEPLIFTER      = 22
NUMCEPS        = 12
ENORMALISE     = TRUE
ESCALE         = 1.0
DELTAWINDOW    = 2
ACCWINDOW      = 2

Feature: MFCC_E_D_A, PLP_0_D_A (static + delta + delta^2)
HMM: 3-state left-to-right speaker-independent HMM without skip paths
Training Procedures & Model Topology: Same
Number of model parameters: Same
Comparison way of HTK and HTS: Replacement of binary files

Junichi Yamagishi
CSTR

Follow-Ups
: [hts-users:01214] Re: WER Comparison of HTK and HTS, Heiga ZEN (Byung Ha CHUN); [hts-users:01218] Re: WER Comparison of HTK and HTS, Sacha Krstulovic

Prev by Subject: [hts-users:01212] Re: Problems in algorithm of parameter generation in HGen.c and HTS_mplg.c
Next by Subject: [hts-users:01214] Re: WER Comparison of HTK and HTS
Previous by thread: [hts-users:01212] Re: Problems in algorithm of parameter generation in HGen.c and HTS_mplg.c
Next by thread: [hts-users:01214] Re: WER Comparison of HTK and HTS