[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03460] Re: Probable little bug in hts_engine API version 1.06


hts_engine does not support state-level alignments as input labels;
-vp is to turn on using phoneme-level alignments from input labels,
rather than state-level ones.  Note that HSMMAlign can produce
phoneme-level alignments.  If you do want to use state-level
alignments, you can use HMGenS.

Regards,

Heiga


2012/11/15 王瘢雹洋 <yangwang84@xxxxxxxxx>:
> Hi, the HTS working group members,
>
>
>
>    I guess I found a little bug in hts_engine API version 1.06. When doing
> parameter generation with full state aligned labels files, the probable
> input label file of hts_engine is the output label file of HSMMAlign with
> full state alignment function, but there is a little incompatibility between
> the two tools on the format of assumed label file formats.
>
>
>
>    The output of HSMMAlign (version 2.2) with full state alignment is like
> this:
>
> t0 t1 full_label1[2] full_label1
>
> t1 t2 full_label1[3]
>
> t2 t3 full_label1[4]
>
> t3 t4 full_label2[2] full_label2
>
> t4 t5 full_label2[3]
>
> t5 t6 full_label2[4]
>
> ...
>
>    But the expect input format of hts_engine with -vp switch only works
> correctly for phoneme alignment, the format of which is like this:
>
> t0 t3 full_label1
>
> t3 t6 full_label2
>
> ...
>
>
>
>    When I synthesize with hts_engine by turn on the -vp switch with full
> state aligned label files, I get incorrect synthesized speech samples.
>
>
>
>    I wrote a piece of code to work correctly in the case of label files with
> full state alignment, which may be useful for you. That is like this (in
> HTS_SStreamSet_create() ):
>
>
>
>    if (HTS_Label_get_frame_specified_flag(label)) {
>
>       /* use duration set by user */
>
>
>
>         /*
>
>         This block of code is rewritten for correctly assign state duration
> in full state alignment case,
>
>             which works as follows:
>
>         Step 1. Calculate the specified state duration for each state in
> state stream
>
>                 with label files including phoneme boundary information.
>
>                 Such label files are probably the output of HSMMAlign.
>
>         Step 2. Reformat label string from the full state alignment format
>
>                 to the format without alignment information by deleting the
> extra label entries, eg.:
>
>
>
>             Sample output by HSMMAlign                  Hand deleted
> Reformated label string
>
>             t0 t1 full_label1[2] full_label1    =>      t0 t1 full_label1[2]
> =>    full_label1[2]
>
>             t1 t2 full_label1[3]                =>      t1 t2 full_label1[3]
> =>    (null)
>
>             t2 t3 full_label1[4]                =>      t2 t3 full_label1[4]
> =>    (null)
>
>             t3 t4 full_label2[2] full_label2    =>      t3 t4 full_label2[2]
> =>    full_label2[2]
>
>             t4 t5 full_label2[3]                =>      t4 t5 full_label2[3]
> =>    (null)
>
>             t5 t6 full_label2[4]                =>      t5 t6 full_label2[4]
> =>    (null)
>
>                           ...
>
>
>
>             The final reformated label string should be:
>
>             full_label1
>
>             full_label2
>
>             ...
>
>
>
>         Notes:  1. State relevant characters ( eg. [2] ) are not removed,
>
>                     which should not interfere the following parameter
> generation procedure.
>
>                 2. Phoneme boundary information ( eg. t0 ) are ignored,
> instead of actually removed.  */
>
>
>
>              int startFrame = 0, endFrame = 0, j = 0;
>
>              HTS_LabelString *labelString, *labelStringToFree;
>
>
>
>              /* Calculate the specified state duration. */
>
>         for ( i = 0; i < HTS_Label_get_size( label ); i++ )
>
>         {
>
>              endFrame = (int)( HTS_Label_get_end_frame( label, i ) + 0.5 );
>
>              sss->duration[ i ] =  endFrame - startFrame;
>
>              startFrame = endFrame;
>
>         }
>
>
>
>         /* Reformat label string for following parameter generation
> procedure. */
>
>         labelString = label->head;
>
>         for ( i = 0; i < HTS_Label_get_size( label ) / sss->nstate; i++ )
>
>         {
>
>             for ( j = 2; j < sss->nstate + 1; j++ )
>
>             {
>
>                 labelStringToFree = labelString->next;
>
>                 labelString->next = labelString->next->next;
>
>
>
>                 if ( j < sss->nstate )
>
>                 {
>
>                     HTS_free( labelStringToFree->name );
>
>                     HTS_free( labelStringToFree );
>
>                 }
>
>             }
>
>             labelString = labelString->next;
>
>         }
>
>         label->size /= sss->nstate;
>
>
>
> //       next_time = 0;
>
> //       next_state = 0;
>
> //       state = 0;
>
> //       for (i = 0; i < HTS_Label_get_size(label); i++) {
>
> //          temp = HTS_Label_get_end_frame(label, i);
>
> //          if (temp >= 0) {
>
> //             next_time += HTS_set_duration(&sss->duration[next_state],
> &duration_mean[next_state], &duration_vari[next_state], state + sss->nstate
> - next_state, temp - next_time);
>
> //             next_state = state + sss->nstate;
>
> //          } else if (i + 1 == HTS_Label_get_size(label)) {
>
> //             HTS_error(-1, "HTS_SStreamSet_create: The time of final label
> is not specified.\n");
>
> //             HTS_set_duration(&sss->duration[next_state],
> &duration_mean[next_state], &duration_vari[next_state], state + sss->nstate
> - next_state, 0.0);
>
> //          }
>
> //          state += sss->nstate;
>
> //       }
>
>     }
>
>
> YangWang



-- 
Heiga ZEN (in Japanese)
Byung Ha CHUN (in Korean)
<heigazen@xxxxxxxxxx>

Follow-Ups
[hts-users:03461] Re: Probable little bug in hts_engine API version 1.06, Keiichiro Oura
References
[hts-users:03459] Probable little bug in hts_engine API version 1.06, 王洋