[hts-users:03460] Re: Probable little bug in hts_engine API version 1.06
- Subject: [hts-users:03460] Re: Probable little bug in hts_engine API version 1.06
- From: "Heiga ZEN (Byung Ha CHUN)" <heigazen@xxxxxxxxxx>
- Date: Thu, 15 Nov 2012 10:49:31 +0000
- Delivered-to: hts-users@xxxxxxxxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=m8olJi5WiTtJD5l5fU7csTcNPPeWt8j1xZBxi0PEpB8=; b=EESK5Ak2WWC3nljFpoNfr50/GQrgk1Wv54AHnYGdrLUpCsK/vU7qVyAUFA8wqwk2NN qm9xe0y46Qb9r7rm7THycPH09T5YtDbyEzDEB4nazQDu1Vu6k2ve3pg4U1R0lKi/EHy9 xvVHie2kOYke26aqFTgVwyCuewTvFZPSegrxKpLpDHykaYO8M2/5XLpOkQDJf9XN4QVw ADX8F2HFuf3M2egi8RS++YOV6DM1HdV3po5aCCvi1GIjSKP09Xxc05rRAELd1OmzZ64j J2iddcdb+fQqgAPlU63Dx38QgjzB9G1uACzJ2BUTasFTcd5yf6weBkh7ux3Ee3HT7jRp VJlA==
hts_engine does not support state-level alignments as input labels;
-vp is to turn on using phoneme-level alignments from input labels,
rather than state-level ones. Note that HSMMAlign can produce
phoneme-level alignments. If you do want to use state-level
alignments, you can use HMGenS.
Regards,
Heiga
2012/11/15 王瘢雹洋 <yangwang84@xxxxxxxxx>:
> Hi, the HTS working group members,
>
>
>
> I guess I found a little bug in hts_engine API version 1.06. When doing
> parameter generation with full state aligned labels files, the probable
> input label file of hts_engine is the output label file of HSMMAlign with
> full state alignment function, but there is a little incompatibility between
> the two tools on the format of assumed label file formats.
>
>
>
> The output of HSMMAlign (version 2.2) with full state alignment is like
> this:
>
> t0 t1 full_label1[2] full_label1
>
> t1 t2 full_label1[3]
>
> t2 t3 full_label1[4]
>
> t3 t4 full_label2[2] full_label2
>
> t4 t5 full_label2[3]
>
> t5 t6 full_label2[4]
>
> ...
>
> But the expect input format of hts_engine with -vp switch only works
> correctly for phoneme alignment, the format of which is like this:
>
> t0 t3 full_label1
>
> t3 t6 full_label2
>
> ...
>
>
>
> When I synthesize with hts_engine by turn on the -vp switch with full
> state aligned label files, I get incorrect synthesized speech samples.
>
>
>
> I wrote a piece of code to work correctly in the case of label files with
> full state alignment, which may be useful for you. That is like this (in
> HTS_SStreamSet_create() ):
>
>
>
> if (HTS_Label_get_frame_specified_flag(label)) {
>
> /* use duration set by user */
>
>
>
> /*
>
> This block of code is rewritten for correctly assign state duration
> in full state alignment case,
>
> which works as follows:
>
> Step 1. Calculate the specified state duration for each state in
> state stream
>
> with label files including phoneme boundary information.
>
> Such label files are probably the output of HSMMAlign.
>
> Step 2. Reformat label string from the full state alignment format
>
> to the format without alignment information by deleting the
> extra label entries, eg.:
>
>
>
> Sample output by HSMMAlign Hand deleted
> Reformated label string
>
> t0 t1 full_label1[2] full_label1 => t0 t1 full_label1[2]
> => full_label1[2]
>
> t1 t2 full_label1[3] => t1 t2 full_label1[3]
> => (null)
>
> t2 t3 full_label1[4] => t2 t3 full_label1[4]
> => (null)
>
> t3 t4 full_label2[2] full_label2 => t3 t4 full_label2[2]
> => full_label2[2]
>
> t4 t5 full_label2[3] => t4 t5 full_label2[3]
> => (null)
>
> t5 t6 full_label2[4] => t5 t6 full_label2[4]
> => (null)
>
> ...
>
>
>
> The final reformated label string should be:
>
> full_label1
>
> full_label2
>
> ...
>
>
>
> Notes: 1. State relevant characters ( eg. [2] ) are not removed,
>
> which should not interfere the following parameter
> generation procedure.
>
> 2. Phoneme boundary information ( eg. t0 ) are ignored,
> instead of actually removed. */
>
>
>
> int startFrame = 0, endFrame = 0, j = 0;
>
> HTS_LabelString *labelString, *labelStringToFree;
>
>
>
> /* Calculate the specified state duration. */
>
> for ( i = 0; i < HTS_Label_get_size( label ); i++ )
>
> {
>
> endFrame = (int)( HTS_Label_get_end_frame( label, i ) + 0.5 );
>
> sss->duration[ i ] = endFrame - startFrame;
>
> startFrame = endFrame;
>
> }
>
>
>
> /* Reformat label string for following parameter generation
> procedure. */
>
> labelString = label->head;
>
> for ( i = 0; i < HTS_Label_get_size( label ) / sss->nstate; i++ )
>
> {
>
> for ( j = 2; j < sss->nstate + 1; j++ )
>
> {
>
> labelStringToFree = labelString->next;
>
> labelString->next = labelString->next->next;
>
>
>
> if ( j < sss->nstate )
>
> {
>
> HTS_free( labelStringToFree->name );
>
> HTS_free( labelStringToFree );
>
> }
>
> }
>
> labelString = labelString->next;
>
> }
>
> label->size /= sss->nstate;
>
>
>
> // next_time = 0;
>
> // next_state = 0;
>
> // state = 0;
>
> // for (i = 0; i < HTS_Label_get_size(label); i++) {
>
> // temp = HTS_Label_get_end_frame(label, i);
>
> // if (temp >= 0) {
>
> // next_time += HTS_set_duration(&sss->duration[next_state],
> &duration_mean[next_state], &duration_vari[next_state], state + sss->nstate
> - next_state, temp - next_time);
>
> // next_state = state + sss->nstate;
>
> // } else if (i + 1 == HTS_Label_get_size(label)) {
>
> // HTS_error(-1, "HTS_SStreamSet_create: The time of final label
> is not specified.\n");
>
> // HTS_set_duration(&sss->duration[next_state],
> &duration_mean[next_state], &duration_vari[next_state], state + sss->nstate
> - next_state, 0.0);
>
> // }
>
> // state += sss->nstate;
>
> // }
>
> }
>
>
> YangWang
--
Heiga ZEN (in Japanese)
Byung Ha CHUN (in Korean)
<heigazen@xxxxxxxxxx>
- Follow-Ups
-
- [hts-users:03461] Re: Probable little bug in hts_engine API version 1.06, Keiichiro Oura
- References
-
- [hts-users:03459] Probable little bug in hts_engine API version 1.06, 王洋