[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:02303] Fwd: A bug? Or my misunderstanding?


Hi,

Hui (from IDIAP) found a bug of the RC command of HHEd.
Please see below for details.

Ta,
Junichi

Begin forwarded message:

$B:9=P?M(B: "Hui LIANG (@Idiap)" <hui.liang@xxxxxxxx>
$BF|;~(B: 1 December 2009 14:32:52 GMT
$B08@h(B: Junichi Yamagishi <jyamagis@xxxxxxxxxxxx>, uratec@xxxxxxxxxxxxxxx
$B7oL>(B: A bug? Or my misunderstanding?

Hello, Yamagishi-san and Oura-san,

I am looking into the HTS code related to the RC command of HHEd. I
noticed the absence of one statement in the function CalcDistance() in
HHEd.c, which I don't understand at all.

*****************************************
void CalcDistance (CoList *list, RNode *ch1, RNode *ch2)
{
......
     if (score1 < score2) {
        if (acc != NULL) {
           ch1->clustAcc += acc->occ;
           ch1->clusterScore += (acc->occ * score1);
           for (k=1; k<=vSize; k++)
              sum1[k] += acc->sum[k];
        }
     }
     else {
        if (acc != NULL) {
           ch2->clustAcc += acc->occ;
           ch2->clusterScore += (acc->occ * score2); (***)
           for (k=1; k<=vSize; k++)
              sum2[k] += acc->sum[k];
        }
     }
  }
......
}
*****************************************

The statement (***) is missing from HHEd.c of HTS but is present in
HHEd.c of HTK. I checked the HTS patch
(http://hts.sp.nitech.ac.jp/archives/2.1/HTS-2.1_for_HTK-3.4.tar.bz2)
and the statement was deleted from the patch. The statement is also
missing from the HHEd.c in the EMIME SVN server.

Without the statement, regression tree building looks wrong (see the
score of Cluster 2):

*****************************************
RC 32 reg
Building regression tree with 32 terminals (4 streams)
Creating regression class tree with ident reg.tree and baseclass reg.base
Splitting Node 1, score 1.000000e+10 (Stream splitting)
Splitting Node 3, score 1.000000e+10 (Stream splitting)
Splitting Node 5, score 1.000000e+10 (Stream splitting)
Splitting Node 7, score 1.000000e+10 (MSD splitting)
Splitting Node 6, score 1.000000e+10 (MSD splitting)
Splitting Node 4, score 1.000000e+10 (MSD splitting)
Splitting Node 2, score 6.892598e+06 (Stream=1, vSize=75)
Iteration 1: Distance = 7.713607e-01
Iteration 2: Distance = 7.697198e-01, Delta = 1.640893e-03
Iteration 3: Distance = 8.384113e-01, Delta = -6.869147e-02
Cluster 1: Score 3.155571e+06, Occ 2.473118e+06     Cluster 2: Score
0.000000e+00, Occ 1.290632e+06
Splitting Node 14, score 3.528862e+06 (Stream=1, vSize=75)
Iteration 1: Distance = 7.066840e-01
Iteration 2: Distance = 7.420171e-01, Delta = -3.533309e-02
Cluster 1: Score 1.933253e+06, Occ 1.871265e+06     Cluster 2: Score
0.000000e+00, Occ 7.341370e+05
Splitting Node 16, score 2.048896e+06 (Stream=1, vSize=75)
Iteration 1: Distance = 5.623881e-01
Iteration 2: Distance = 4.876305e-01, Delta = 7.475763e-02
Iteration 3: Distance = 4.743098e-01, Delta = 1.332077e-02
Iteration 4: Distance = 4.678290e-01, Delta = 6.480706e-03
Iteration 5: Distance = 4.638863e-01, Delta = 3.942776e-03
Iteration 6: Distance = 4.610623e-01, Delta = 2.824021e-03
Iteration 7: Distance = 4.608872e-01, Delta = 1.750366e-04
Iteration 8: Distance = 4.621481e-01, Delta = -1.260920e-03
Cluster 1: Score 9.051442e+05, Occ 1.068847e+06     Cluster 2: Score
0.000000e+00, Occ 8.897113e+05
*****************************************

So I suppose this is a bug. Please let me know your opinion.

Thanks a lot!

Best regards,
Hui


The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.