$B:9=P?M(B: "Hui LIANG (@Idiap)" <hui.liang@xxxxxxxx>
$BF|;~(B: 1 December 2009 14:32:52 GMT
$B08@h(B: Junichi Yamagishi <jyamagis@xxxxxxxxxxxx>,
uratec@xxxxxxxxxxxxxxx
$B7oL>(B: A bug? Or my misunderstanding?
Hello, Yamagishi-san and Oura-san,
I am looking into the HTS code related to the RC command of HHEd. I
noticed the absence of one statement in the function CalcDistance() in
HHEd.c, which I don't understand at all.
*****************************************
void CalcDistance (CoList *list, RNode *ch1, RNode *ch2)
{
......
if (score1 < score2) {
if (acc != NULL) {
ch1->clustAcc += acc->occ;
ch1->clusterScore += (acc->occ * score1);
for (k=1; k<=vSize; k++)
sum1[k] += acc->sum[k];
}
}
else {
if (acc != NULL) {
ch2->clustAcc += acc->occ;
ch2->clusterScore += (acc->occ * score2); (***)
for (k=1; k<=vSize; k++)
sum2[k] += acc->sum[k];
}
}
}
......
}
*****************************************
The statement (***) is missing from HHEd.c of HTS but is present in
HHEd.c of HTK. I checked the HTS patch
(http://hts.sp.nitech.ac.jp/archives/2.1/HTS-2.1_for_HTK-3.4.tar.bz2)
and the statement was deleted from the patch. The statement is also
missing from the HHEd.c in the EMIME SVN server.
Without the statement, regression tree building looks wrong (see the
score of Cluster 2):
*****************************************
RC 32 reg
Building regression tree with 32 terminals (4 streams)
Creating regression class tree with ident reg.tree and baseclass
reg.base
Splitting Node 1, score 1.000000e+10 (Stream splitting)
Splitting Node 3, score 1.000000e+10 (Stream splitting)
Splitting Node 5, score 1.000000e+10 (Stream splitting)
Splitting Node 7, score 1.000000e+10 (MSD splitting)
Splitting Node 6, score 1.000000e+10 (MSD splitting)
Splitting Node 4, score 1.000000e+10 (MSD splitting)
Splitting Node 2, score 6.892598e+06 (Stream=1, vSize=75)
Iteration 1: Distance = 7.713607e-01
Iteration 2: Distance = 7.697198e-01, Delta = 1.640893e-03
Iteration 3: Distance = 8.384113e-01, Delta = -6.869147e-02
Cluster 1: Score 3.155571e+06, Occ 2.473118e+06 Cluster 2: Score
0.000000e+00, Occ 1.290632e+06
Splitting Node 14, score 3.528862e+06 (Stream=1, vSize=75)
Iteration 1: Distance = 7.066840e-01
Iteration 2: Distance = 7.420171e-01, Delta = -3.533309e-02
Cluster 1: Score 1.933253e+06, Occ 1.871265e+06 Cluster 2: Score
0.000000e+00, Occ 7.341370e+05
Splitting Node 16, score 2.048896e+06 (Stream=1, vSize=75)
Iteration 1: Distance = 5.623881e-01
Iteration 2: Distance = 4.876305e-01, Delta = 7.475763e-02
Iteration 3: Distance = 4.743098e-01, Delta = 1.332077e-02
Iteration 4: Distance = 4.678290e-01, Delta = 6.480706e-03
Iteration 5: Distance = 4.638863e-01, Delta = 3.942776e-03
Iteration 6: Distance = 4.610623e-01, Delta = 2.824021e-03
Iteration 7: Distance = 4.608872e-01, Delta = 1.750366e-04
Iteration 8: Distance = 4.621481e-01, Delta = -1.260920e-03
Cluster 1: Score 9.051442e+05, Occ 1.068847e+06 Cluster 2: Score
0.000000e+00, Occ 8.897113e+05
*****************************************
So I suppose this is a bug. Please let me know your opinion.
Thanks a lot!
Best regards,
Hui