[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03461] Re: Probable little bug in hts_engine API version 1.06


Hi,

As Heiga said, hts_engine API does not support state-level alignment
as input labels.
If you want to get state-level alignments, please use HSMMAlign with
-f option for forced alignment of HSMM.

Regards,
Keiichiro Oura


2012/11/15 Heiga ZEN (Byung Ha CHUN) <heigazen@xxxxxxxxxx>:
> hts_engine does not support state-level alignments as input labels;
> -vp is to turn on using phoneme-level alignments from input labels,
> rather than state-level ones.  Note that HSMMAlign can produce
> phoneme-level alignments.  If you do want to use state-level
> alignments, you can use HMGenS.
>
> Regards,
>
> Heiga
>
>
> 2012/11/15 王瘢雹洋 <yangwang84@xxxxxxxxx>:
>> Hi, the HTS working group members,
>>
>>
>>
>>    I guess I found a little bug in hts_engine API version 1.06. When doing
>> parameter generation with full state aligned labels files, the probable
>> input label file of hts_engine is the output label file of HSMMAlign with
>> full state alignment function, but there is a little incompatibility between
>> the two tools on the format of assumed label file formats.
>>
>>
>>
>>    The output of HSMMAlign (version 2.2) with full state alignment is like
>> this:
>>
>> t0 t1 full_label1[2] full_label1
>>
>> t1 t2 full_label1[3]
>>
>> t2 t3 full_label1[4]
>>
>> t3 t4 full_label2[2] full_label2
>>
>> t4 t5 full_label2[3]
>>
>> t5 t6 full_label2[4]
>>
>> ...
>>
>>    But the expect input format of hts_engine with -vp switch only works
>> correctly for phoneme alignment, the format of which is like this:
>>
>> t0 t3 full_label1
>>
>> t3 t6 full_label2
>>
>> ...
>>
>>
>>
>>    When I synthesize with hts_engine by turn on the -vp switch with full
>> state aligned label files, I get incorrect synthesized speech samples.
>>
>>
>>
>>    I wrote a piece of code to work correctly in the case of label files with
>> full state alignment, which may be useful for you. That is like this (in
>> HTS_SStreamSet_create() ):
>>
>>
>>
>>    if (HTS_Label_get_frame_specified_flag(label)) {
>>
>>       /* use duration set by user */
>>
>>
>>
>>         /*
>>
>>         This block of code is rewritten for correctly assign state duration
>> in full state alignment case,
>>
>>             which works as follows:
>>
>>         Step 1. Calculate the specified state duration for each state in
>> state stream
>>
>>                 with label files including phoneme boundary information.
>>
>>                 Such label files are probably the output of HSMMAlign.
>>
>>         Step 2. Reformat label string from the full state alignment format
>>
>>                 to the format without alignment information by deleting the
>> extra label entries, eg.:
>>
>>
>>
>>             Sample output by HSMMAlign                  Hand deleted
>> Reformated label string
>>
>>             t0 t1 full_label1[2] full_label1    =>      t0 t1 full_label1[2]
>> =>    full_label1[2]
>>
>>             t1 t2 full_label1[3]                =>      t1 t2 full_label1[3]
>> =>    (null)
>>
>>             t2 t3 full_label1[4]                =>      t2 t3 full_label1[4]
>> =>    (null)
>>
>>             t3 t4 full_label2[2] full_label2    =>      t3 t4 full_label2[2]
>> =>    full_label2[2]
>>
>>             t4 t5 full_label2[3]                =>      t4 t5 full_label2[3]
>> =>    (null)
>>
>>             t5 t6 full_label2[4]                =>      t5 t6 full_label2[4]
>> =>    (null)
>>
>>                           ...
>>
>>
>>
>>             The final reformated label string should be:
>>
>>             full_label1
>>
>>             full_label2
>>
>>             ...
>>
>>
>>
>>         Notes:  1. State relevant characters ( eg. [2] ) are not removed,
>>
>>                     which should not interfere the following parameter
>> generation procedure.
>>
>>                 2. Phoneme boundary information ( eg. t0 ) are ignored,
>> instead of actually removed.  */
>>
>>
>>
>>              int startFrame = 0, endFrame = 0, j = 0;
>>
>>              HTS_LabelString *labelString, *labelStringToFree;
>>
>>
>>
>>              /* Calculate the specified state duration. */
>>
>>         for ( i = 0; i < HTS_Label_get_size( label ); i++ )
>>
>>         {
>>
>>              endFrame = (int)( HTS_Label_get_end_frame( label, i ) + 0.5 );
>>
>>              sss->duration[ i ] =  endFrame - startFrame;
>>
>>              startFrame = endFrame;
>>
>>         }
>>
>>
>>
>>         /* Reformat label string for following parameter generation
>> procedure. */
>>
>>         labelString = label->head;
>>
>>         for ( i = 0; i < HTS_Label_get_size( label ) / sss->nstate; i++ )
>>
>>         {
>>
>>             for ( j = 2; j < sss->nstate + 1; j++ )
>>
>>             {
>>
>>                 labelStringToFree = labelString->next;
>>
>>                 labelString->next = labelString->next->next;
>>
>>
>>
>>                 if ( j < sss->nstate )
>>
>>                 {
>>
>>                     HTS_free( labelStringToFree->name );
>>
>>                     HTS_free( labelStringToFree );
>>
>>                 }
>>
>>             }
>>
>>             labelString = labelString->next;
>>
>>         }
>>
>>         label->size /= sss->nstate;
>>
>>
>>
>> //       next_time = 0;
>>
>> //       next_state = 0;
>>
>> //       state = 0;
>>
>> //       for (i = 0; i < HTS_Label_get_size(label); i++) {
>>
>> //          temp = HTS_Label_get_end_frame(label, i);
>>
>> //          if (temp >= 0) {
>>
>> //             next_time += HTS_set_duration(&sss->duration[next_state],
>> &duration_mean[next_state], &duration_vari[next_state], state + sss->nstate
>> - next_state, temp - next_time);
>>
>> //             next_state = state + sss->nstate;
>>
>> //          } else if (i + 1 == HTS_Label_get_size(label)) {
>>
>> //             HTS_error(-1, "HTS_SStreamSet_create: The time of final label
>> is not specified.\n");
>>
>> //             HTS_set_duration(&sss->duration[next_state],
>> &duration_mean[next_state], &duration_vari[next_state], state + sss->nstate
>> - next_state, 0.0);
>>
>> //          }
>>
>> //          state += sss->nstate;
>>
>> //       }
>>
>>     }
>>
>>
>> YangWang
>
>
>
> --
> Heiga ZEN (in Japanese)
> Byung Ha CHUN (in Korean)
> <heigazen@xxxxxxxxxx>
>

References
[hts-users:03459] Probable little bug in hts_engine API version 1.06, 王洋
[hts-users:03460] Re: Probable little bug in hts_engine API version 1.06, Heiga ZEN (Byung Ha CHUN)