[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03459] Probable little bug in hts_engine API version 1.06


Hi, the HTS working group members,

 

   I guess I found a little bug in hts_engine API version 1.06. When doing parameter generation with full state aligned labels files, the probable input label file of hts_engine is the output label file of HSMMAlign with full state alignment function, but there is a little incompatibility between the two tools on the format of assumed label file formats.

 

   The output of HSMMAlign (version 2.2) with full state alignment is like this:

t0 t1 full_label1[2] full_label1

t1 t2 full_label1[3]           

t2 t3 full_label1[4]           

t3 t4 full_label2[2] full_label2

t4 t5 full_label2[3]   

t5 t6 full_label2[4]   

...                            

   But the expect input format of hts_engine with -vp switch only works correctly for phoneme alignment, the format of which is like this:

t0 t3 full_label1

t3 t6 full_label2

...                            

 

   When I synthesize with hts_engine by turn on the -vp switch with full state aligned label files, I get incorrect synthesized speech samples.

 

   I wrote a piece of code to work correctly in the case of label files with full state alignment, which may be useful for you. That is like this (in HTS_SStreamSet_create() ):

 

   if (HTS_Label_get_frame_specified_flag(label)) {

      /* use duration set by user */

 

        /*

        This block of code is rewritten for correctly assign state duration in full state alignment case,

            which works as follows:

        Step 1. Calculate the specified state duration for each state in state stream

                with label files including phoneme boundary information.

                Such label files are probably the output of HSMMAlign.

        Step 2. Reformat label string from the full state alignment format

                to the format without alignment information by deleting the extra label entries, eg.:

 

            Sample output by HSMMAlign                  Hand deleted                Reformated label string

            t0 t1 full_label1[2] full_label1    =>      t0 t1 full_label1[2]  =>    full_label1[2]

            t1 t2 full_label1[3]                =>      t1 t2 full_label1[3]  =>    (null)

            t2 t3 full_label1[4]                =>      t2 t3 full_label1[4]  =>    (null)

            t3 t4 full_label2[2] full_label2    =>      t3 t4 full_label2[2]  =>    full_label2[2]

            t4 t5 full_label2[3]                =>      t4 t5 full_label2[3]  =>    (null)

            t5 t6 full_label2[4]                =>      t5 t6 full_label2[4]  =>    (null)

                          ...

                         

            The final reformated label string should be:

            full_label1

            full_label2

            ...

 

        Notes:  1. State relevant characters ( eg. [2] ) are not removed,

                    which should not interfere the following parameter generation procedure.

                2. Phoneme boundary information ( eg. t0 ) are ignored, instead of actually removed.  */

 

             int startFrame = 0, endFrame = 0, j = 0;

             HTS_LabelString *labelString, *labelStringToFree;

       

             /* Calculate the specified state duration. */

        for ( i = 0; i < HTS_Label_get_size( label ); i++ )

        {

             endFrame = (int)( HTS_Label_get_end_frame( label, i ) + 0.5 );

             sss->duration[ i ] =  endFrame - startFrame;

             startFrame = endFrame;

        }

            

        /* Reformat label string for following parameter generation procedure. */

        labelString = label->head;

        for ( i = 0; i < HTS_Label_get_size( label ) / sss->nstate; i++ )

        {

            for ( j = 2; j < sss->nstate + 1; j++ )

            {

                labelStringToFree = labelString->next;

                labelString->next = labelString->next->next;

             

                if ( j < sss->nstate )

                {

                    HTS_free( labelStringToFree->name );

                    HTS_free( labelStringToFree );

                }

            }

            labelString = labelString->next;

        }

        label->size /= sss->nstate;

 

//       next_time = 0;

//       next_state = 0;

//       state = 0;

//       for (i = 0; i < HTS_Label_get_size(label); i++) {

//          temp = HTS_Label_get_end_frame(label, i);

//          if (temp >= 0) {

//             next_time += HTS_set_duration(&sss->duration[next_state], &duration_mean[next_state], &duration_vari[next_state], state + sss->nstate - next_state, temp - next_time);

//             next_state = state + sss->nstate;

//          } else if (i + 1 == HTS_Label_get_size(label)) {

//             HTS_error(-1, "HTS_SStreamSet_create: The time of final label is not specified.\n");

//             HTS_set_duration(&sss->duration[next_state], &duration_mean[next_state], &duration_vari[next_state], state + sss->nstate - next_state, 0.0);

//          }

//          state += sss->nstate;

//       }

    }


YangWang


Follow-Ups
[hts-users:03460] Re: Probable little bug in hts_engine API version 1.06, Heiga ZEN (Byung Ha CHUN)