[hts-users:03461] Re: Probable little bug in hts_engine API version 1.06
- Subject: [hts-users:03461] Re: Probable little bug in hts_engine API version 1.06
- From: Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
- Date: Thu, 15 Nov 2012 20:22:51 +0900
- Cc: uratec <uratec@xxxxxxxxxxxx>
- Delivered-to: hts-users@xxxxxxxxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=+T2xXE+JlLnsb87NrQs9dSNUAzCJwer5lN4BB1+YAZo=; b=dpUxqPW/mKeavygda7MkqmEnwd/eg9bAQ2qkTdKldDA83VBxIrgi7JZye6YEBhqilB MPaIsQhxseefdzPF4HcN0HjVbH3jwNIs9ZFRqXDURPYN5rWwPstcqn9fUJ/AxQNuBvWd 5pD0f7SKk72gk+nLBL2+XGBirAKSz0Vnc8XELuoFMoYgqQXXQby4JtOSlDPPlr19uxzh H7PnvAwX5y/GAFRx0pCE/+VCRvM7J+TtuoaOUNxfJFbxGJ6zPteLzXRdY1hECVpmvJ2l w6L4ncaWVjai3iYS8SCYe4wWf7ycQrRfadUlq+mBwRrqYEIbo6Rtks4hXEfa9/Yqlqmh HF5A==
Hi,
As Heiga said, hts_engine API does not support state-level alignment
as input labels.
If you want to get state-level alignments, please use HSMMAlign with
-f option for forced alignment of HSMM.
Regards,
Keiichiro Oura
2012/11/15 Heiga ZEN (Byung Ha CHUN) <heigazen@xxxxxxxxxx>:
> hts_engine does not support state-level alignments as input labels;
> -vp is to turn on using phoneme-level alignments from input labels,
> rather than state-level ones. Note that HSMMAlign can produce
> phoneme-level alignments. If you do want to use state-level
> alignments, you can use HMGenS.
>
> Regards,
>
> Heiga
>
>
> 2012/11/15 王瘢雹洋 <yangwang84@xxxxxxxxx>:
>> Hi, the HTS working group members,
>>
>>
>>
>> I guess I found a little bug in hts_engine API version 1.06. When doing
>> parameter generation with full state aligned labels files, the probable
>> input label file of hts_engine is the output label file of HSMMAlign with
>> full state alignment function, but there is a little incompatibility between
>> the two tools on the format of assumed label file formats.
>>
>>
>>
>> The output of HSMMAlign (version 2.2) with full state alignment is like
>> this:
>>
>> t0 t1 full_label1[2] full_label1
>>
>> t1 t2 full_label1[3]
>>
>> t2 t3 full_label1[4]
>>
>> t3 t4 full_label2[2] full_label2
>>
>> t4 t5 full_label2[3]
>>
>> t5 t6 full_label2[4]
>>
>> ...
>>
>> But the expect input format of hts_engine with -vp switch only works
>> correctly for phoneme alignment, the format of which is like this:
>>
>> t0 t3 full_label1
>>
>> t3 t6 full_label2
>>
>> ...
>>
>>
>>
>> When I synthesize with hts_engine by turn on the -vp switch with full
>> state aligned label files, I get incorrect synthesized speech samples.
>>
>>
>>
>> I wrote a piece of code to work correctly in the case of label files with
>> full state alignment, which may be useful for you. That is like this (in
>> HTS_SStreamSet_create() ):
>>
>>
>>
>> if (HTS_Label_get_frame_specified_flag(label)) {
>>
>> /* use duration set by user */
>>
>>
>>
>> /*
>>
>> This block of code is rewritten for correctly assign state duration
>> in full state alignment case,
>>
>> which works as follows:
>>
>> Step 1. Calculate the specified state duration for each state in
>> state stream
>>
>> with label files including phoneme boundary information.
>>
>> Such label files are probably the output of HSMMAlign.
>>
>> Step 2. Reformat label string from the full state alignment format
>>
>> to the format without alignment information by deleting the
>> extra label entries, eg.:
>>
>>
>>
>> Sample output by HSMMAlign Hand deleted
>> Reformated label string
>>
>> t0 t1 full_label1[2] full_label1 => t0 t1 full_label1[2]
>> => full_label1[2]
>>
>> t1 t2 full_label1[3] => t1 t2 full_label1[3]
>> => (null)
>>
>> t2 t3 full_label1[4] => t2 t3 full_label1[4]
>> => (null)
>>
>> t3 t4 full_label2[2] full_label2 => t3 t4 full_label2[2]
>> => full_label2[2]
>>
>> t4 t5 full_label2[3] => t4 t5 full_label2[3]
>> => (null)
>>
>> t5 t6 full_label2[4] => t5 t6 full_label2[4]
>> => (null)
>>
>> ...
>>
>>
>>
>> The final reformated label string should be:
>>
>> full_label1
>>
>> full_label2
>>
>> ...
>>
>>
>>
>> Notes: 1. State relevant characters ( eg. [2] ) are not removed,
>>
>> which should not interfere the following parameter
>> generation procedure.
>>
>> 2. Phoneme boundary information ( eg. t0 ) are ignored,
>> instead of actually removed. */
>>
>>
>>
>> int startFrame = 0, endFrame = 0, j = 0;
>>
>> HTS_LabelString *labelString, *labelStringToFree;
>>
>>
>>
>> /* Calculate the specified state duration. */
>>
>> for ( i = 0; i < HTS_Label_get_size( label ); i++ )
>>
>> {
>>
>> endFrame = (int)( HTS_Label_get_end_frame( label, i ) + 0.5 );
>>
>> sss->duration[ i ] = endFrame - startFrame;
>>
>> startFrame = endFrame;
>>
>> }
>>
>>
>>
>> /* Reformat label string for following parameter generation
>> procedure. */
>>
>> labelString = label->head;
>>
>> for ( i = 0; i < HTS_Label_get_size( label ) / sss->nstate; i++ )
>>
>> {
>>
>> for ( j = 2; j < sss->nstate + 1; j++ )
>>
>> {
>>
>> labelStringToFree = labelString->next;
>>
>> labelString->next = labelString->next->next;
>>
>>
>>
>> if ( j < sss->nstate )
>>
>> {
>>
>> HTS_free( labelStringToFree->name );
>>
>> HTS_free( labelStringToFree );
>>
>> }
>>
>> }
>>
>> labelString = labelString->next;
>>
>> }
>>
>> label->size /= sss->nstate;
>>
>>
>>
>> // next_time = 0;
>>
>> // next_state = 0;
>>
>> // state = 0;
>>
>> // for (i = 0; i < HTS_Label_get_size(label); i++) {
>>
>> // temp = HTS_Label_get_end_frame(label, i);
>>
>> // if (temp >= 0) {
>>
>> // next_time += HTS_set_duration(&sss->duration[next_state],
>> &duration_mean[next_state], &duration_vari[next_state], state + sss->nstate
>> - next_state, temp - next_time);
>>
>> // next_state = state + sss->nstate;
>>
>> // } else if (i + 1 == HTS_Label_get_size(label)) {
>>
>> // HTS_error(-1, "HTS_SStreamSet_create: The time of final label
>> is not specified.\n");
>>
>> // HTS_set_duration(&sss->duration[next_state],
>> &duration_mean[next_state], &duration_vari[next_state], state + sss->nstate
>> - next_state, 0.0);
>>
>> // }
>>
>> // state += sss->nstate;
>>
>> // }
>>
>> }
>>
>>
>> YangWang
>
>
>
> --
> Heiga ZEN (in Japanese)
> Byung Ha CHUN (in Korean)
> <heigazen@xxxxxxxxxx>
>
- References
-
- [hts-users:03459] Probable little bug in hts_engine API version 1.06, 王洋
- [hts-users:03460] Re: Probable little bug in hts_engine API version 1.06, Heiga ZEN (Byung Ha CHUN)