[hts-users:03193] Re: Problem of building regression tree (average voice

2012/2/28 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>

Hi,

Please check stream[3] in your data before beginning the training.
For example, HList command can be used.

Regards,
Keiichiro Oura

2012/2/28 li jay <lij.acd@xxxxxxxxx>:

> Hi,
>
> Thank you for your help. I conducted some experiments. Originally I have
> speech data of 20 speakers. I built 20 models using sentences of only 1
> speaker (SD model).  I found that re_clustered.mmf of models of some
> speakers abnormal. In these files they were like:
>
> 745 <STREAM> 3
> 746 <NUMMIXES> 2
> 747 <MIXTURE> 1 9.436758e-01
> 748 <MEAN> 1
> 749 -9.250264e-03
> 750 <VARIANCE> 1
> 751 1.051578e+12
>
> 784 <STREAM> 3
> 785 <NUMMIXES> 2
> 786 <MIXTURE> 1 3.212932e-01
> 787 <MEAN> 1
> 788 -1.123511e-02
> 789 <VARIANCE> 1
> 790 1.051578e+12
>
> 979 <STREAM> 3
> 980 <NUMMIXES> 2
> 981 <MIXTURE> 1 8.911794e-01
> 982 <MEAN> 1
> 983 6.913134e-03
> 984 <VARIANCE> 1
> 985 1.051578e+12
>
> The variance of the stream 3 was too big, and ridiculously the variances
> were the same. Could it be a bug of HTS?
> Besides I tried to include these abnormal speakers' speech data into
> training data. Once I include the abnormal speakers' data into the training
> data, an error in building regression tree occurs. But without them I can
> build a regression tree successfully. I think it was that the abnormal data
> made the training of the models unsuccessful, which resulted in failure of
> building the regression tree. But there is something strange. I still can
> generate the voice using the models even if the variation of stream 3 is too
> big.
>
> Additionally, I plotted lf0 figures of those abnormal speakers, and they
> seemed to be nothing wrong with them, which looked like the lf0 figures of
> normal speakers. I am not sure which extracted lf0 files caused the
> abnormality because I used the same method and tools to extract them as I
> did on those normal files. Is it possible that some values are not supported
> by HTS because they are too big or small?
>
> Regards,
> Jay
>
> 2012/2/24 Heiga ZEN (Byung Ha CHUN) <heigazen@xxxxxxxxxx>
>
>> Hi,
>>
>> It is unlikely that having 20 speakers have caused this problem. I expect
>> that some data files are corrupted, i. e., failed to extract speech features
>> from waveforms. Please check whether your data is OK or not.
>>
>> Regards,
>>
>> Heiga
>>
>> 2012/02/24 9:23 "li jay" <lij.acd@xxxxxxxxx>:
>>
>>> Hi,
>>>
>>> Thank you for telling me this. Compared to the delta f0 of the speaker
>>> dependent model I trained previously, it is too big ( the current model I'm
>>> training is an average voice model ) . I thought it was because I used
>>> speech sentences of 20 speakers, which resulted in big delta f0. I used the
>>> same configuration and option settings of speaker dependent model training
>>> and replaced the training data with the data of 20 speakers. Is it correct
>>> to train a average voice model without modify any configuration and option?
>>>
>>> Regards,
>>> Jay
>>>
>>> 2012/2/24 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>>
>>>> Hi,
>>>>
>>>> It seems that delta lf0 is not trained correctly.
>>>>
>>>> <VARIANCE> 1
>>>> 2.078870e+12
>>>>
>>>> The variance is too big.
>>>> You should check delta f0 sequence in training data.
>>>>
>>>> Regards,
>>>> Keiichiro Oura
>>>>
>>>> 2012/2/24 li jay <lij.acd@xxxxxxxxx>:
>>>> > Hi,
>>>> >
>>>> > The following is the part of re_clustered.mmf
>>>> >
>>>> > 170535 ~p "lf0_s2_523-3"
>>>> > 170536 <STREAM> 3
>>>> > 170537 <NUMMIXES> 2
>>>> > 170538 <MIXTURE> 1 1.582146e-01
>>>> > 170539 <MEAN> 1
>>>> > 170540 -9.624681e-03
>>>> > 170541 <VARIANCE> 1
>>>> > 170542 2.078870e+12
>>>> > 170543 <GCONST> 3.020072e+01
>>>> > 170544 <MIXTURE> 2 8.417839e-01
>>>> > 170545 <MEAN> 0
>>>> > 170546 <VARIANCE> 0
>>>> > 170547 <GCONST> 0.000000e+00
>>>> > 170548 ~p "lf0_s2_523-4"
>>>> > 170549 <STREAM> 4
>>>> > 170550 <NUMMIXES> 2
>>>> > 170551 <MIXTURE> 1 1.582144e-01
>>>> > 170552 <MEAN> 1
>>>> > 170553 -1.753086e-03
>>>> > 170554 <VARIANCE> 1
>>>> > 170555 8.353170e-04
>>>> > 170556 <GCONST> -5.249823e+00
>>>> > 170557 <MIXTURE> 2 8.417841e-01
>>>> > ...
>>>> >
>>>> > 260387 ~h
>>>> >
>>>> > "CH_dz`-CH_U+sp/T:x_4_4_x_4/WS:1_6_6/CS:2_7_8/CW:2_1_2/PS:5_19_23/PW:5_1_5/PC:2_1_2"
>>>> > 260388 <BEGINHMM>
>>>> > 260389 <NUMSTATES> 7
>>>> > 260390 <STATE> 2
>>>> > 260391 <STREAM> 1
>>>> > 260392 ~p "mgc_s2_21"
>>>> > 260393 <STREAM> 2
>>>> > 260394 ~p "lf0_s2_523-2"
>>>> > 260395 <STREAM> 3
>>>> > 260396 ~p "lf0_s2_523-3"
>>>> > 260397 <STREAM> 4
>>>> > 260398 ~p "lf0_s2_523-4"
>>>> > 260399 <STATE> 3
>>>> > 260400 <STREAM> 1
>>>> > ...
>>>> >
>>>> > There seem to be no error or in the stream[3]. What could affect the
>>>> > building process of regression tree?
>>>> >
>>>> > Regards,
>>>> > Jay
>>>> >
>>>> > 2012/2/24 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> The distributions are in .../cmp/re_clustered.mmf
>>>> >>
>>>> >> Regards,
>>>> >> Keiichiro Oura
>>>> >>
>>>> >>
>>>> >> 2012/2/24 li jay <lij.acd@xxxxxxxxx>:
>>>> >> > Hi,
>>>> >> >
>>>> >> > The following is part of the log file when I tried to build the
>>>> >> > regression
>>>> >> > tree:
>>>> >> >
>>>> >> > Splitting Node 32763, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node 32765, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node 32767, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node -32767, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node -32765, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node -32763, score 9.997541e+09
>>>> >> >
>>>> >> > The reason why the index went to negative value seemed to be an
>>>> >> > overflow occurred.
>>>> >> > Could you tell me in which file I can check the distribution of
>>>> >> > stream[3]?
>>>> >> > Thank you for you help.
>>>> >> >
>>>> >> > Regards,
>>>> >> > Jay
>>>> >> >
>>>> >> > 2012/2/23 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>> >> >>
>>>> >> >> Hi,
>>>> >> >>
>>>> >> >> This value is node index in "reg.tree".
>>>> >> >>
>>>> >> >> printf("Splitting Node %d, score %e\n", r->nodeIndex, score);
>>>> >> >>
>>>> >> >> Index is always positive value.
>>>> >> >> I don't know why split is in a loop...
>>>> >> >> Could you check the distribution of stream[3]?
>>>> >> >>
>>>> >> >> Regards,
>>>> >> >> Keiichiro Oura
>>>> >> >>
>>>> >> >>
>>>> >> >> 2012/2/23 li jay <lij.acd@xxxxxxxxx>:
>>>> >> >> > Thank you for your reply.
>>>> >> >> >
>>>> >> >> > I've tried HTS2-2, replacing the HTS commands with the ones of
>>>> >> >> > HTS-2.2.
>>>> >> >> > The
>>>> >> >> > result was exactly the same as the one of HTS-2.1.1.  The score
>>>> >> >> > stayed
>>>> >> >> > the
>>>> >> >> > same, and the node split endlessly.
>>>> >> >> >
>>>> >> >> > What do you mean by 'Splitting Node' is negative value? You mean
>>>> >> >> > the
>>>> >> >> > score?
>>>> >> >> >
>>>> >> >> > Regards,
>>>> >> >> > Jay
>>>> >> >> >
>>>> >> >> > 2012/2/22 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>> >> >> >>
>>>> >> >> >> Hi,
>>>> >> >> >>
>>>> >> >> >> Could you try HTS-2.2?
>>>> >> >> >> I don't know why 'Splitting Node' is negative value.
>>>> >> >> >>
>>>> >> >> >> Regards,
>>>> >> >> >> Keiichiro Oura
>>>> >> >> >>
>>>> >> >> >> 2012/2/22 li jay <lij.acd@xxxxxxxxx>:
>>>> >> >> >> > Hi,
>>>> >> >> >> >
>>>> >> >> >> > I've been trying to build a regression tree for speaker
>>>> >> >> >> > adaptation. I
>>>> >> >> >> > am
>>>> >> >> >> > using HTS 2.1.1. I've trained a average voice model from 4000
>>>> >> >> >> > sentences
>>>> >> >> >> > (about 2.5 hrs) of 20 speakers. It was successful to generate
>>>> >> >> >> > voice
>>>> >> >> >> > using
>>>> >> >> >> > the average voice model. I wanted to apply speaker adaptation
>>>> >> >> >> > on
>>>> >> >> >> > this
>>>> >> >> >> > average voice model, so I tried to build a regression tree
>>>> >> >> >> > with
>>>> >> >> >> > the
>>>> >> >> >> > command
>>>> >> >> >> > below:
>>>> >> >> >> > /usr/local/HTS-2.1.1/bin/HHEd -A -B -C
>>>> >> >> >> > /home/jay/TTS/try/AST_female_20_speakers_2/configs/trn.cnf -D
>>>> >> >> >> > -T 1
>>>> >> >> >> > -p
>>>> >> >> >> > -i
>>>> >> >> >> > -H
>>>> >> >> >> > /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speakers_2/models/cmp/re_clust
>>>> >> >> >> > ered.mmf -M /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speakers_2/models/cmp/regTrees
>>>> >> >> >> > /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speakers_2/edfiles/cmp/reg.hed
>>>> >> >> >> > /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speaker
>>>> >> >> >> > s_2/data/lists/full.list
>>>> >> >> >> >
>>>> >> >> >> > The problem was that splitting of nodes did finish. It seemed
>>>> >> >> >> > to
>>>> >> >> >> > be
>>>> >> >> >> > in a
>>>> >> >> >> > loop, and the score stayed the same. So the HHEd command
>>>> >> >> >> > cannot
>>>> >> >> >> > stop.  The
>>>> >> >> >> > log file shows as below:
>>>> >> >> >> >
>>>> >> >> >> > HTK Configuration Parameters[10]
>>>> >> >> >> > Module/Tool Parameter Value
>>>> >> >> >> > # MINDUR 5
>>>> >> >> >> > # MAXSTDDEVCOEF 10
>>>> >> >> >> > # APPLYDURVARFLOOR TRUE
>>>> >> >> >> > # DURVARFLOORPERCENTILE 1.000000
>>>> >> >> >> > # SHRINKOCCTHRESH Vector 4 500.0 100.0 100.0
>>>> >> >> >> > 100.0
>>>> >> >> >> > # VFLOORSCALESTR Vector 4 0.01 0.01 0.01
>>>> >> >> >> > 0.01
>>>> >> >> >> > # MINLEAFOCC 0
>>>> >> >> >> > # NATURALWRITEORDER TRUE
>>>> >> >> >> > # NATURALREADORDER TRUE
>>>> >> >> >> > # APPLYVFLOOR TRUE
>>>> >> >> >> >
>>>> >> >> >> > // construct regression class tree
>>>> >> >> >> > RC 32 reg
>>>> >> >> >> > Building regression tree with 32 terminals (4 streams)
>>>> >> >> >> > Creating regression class tree with ident reg.tree and
>>>> >> >> >> > baseclass
>>>> >> >> >> > reg.base
>>>> >> >> >> > Splitting Node 1, score 1.000000e+10
>>>> >> >> >> > (Stream splitting)
>>>> >> >> >> > Splitting Node 3, score 1.000000e+10
>>>> >> >> >> > (Stream splitting)
>>>> >> >> >> > Splitting Node 5, score 1.000000e+10
>>>> >> >> >> > (Stream splitting)
>>>> >> >> >> > Splitting Node 7, score 1.000000e+10
>>>> >> >> >> > (MSD splitting)
>>>> >> >> >> > Splitting Node 6, score 1.000000e+10
>>>> >> >> >> > (MSD splitting)
>>>> >> >> >> > Splitting Node 10, score 8.998759e+10
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 13, score 2.999760e+10
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 4, score 1.000000e+10
>>>> >> >> >> > (MSD splitting)
>>>> >> >> >> > Splitting Node 15, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 19, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 21, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > ...
>>>> >> >> >> > ...
>>>> >> >> >> > ...
>>>> >> >> >> > Splitting Node -16495, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node -16493, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node -16491, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> >
>>>> >> >> >> > Could you do me a favor to help the problem? My questions
>>>> >> >> >> > are:
>>>> >> >> >> > 1: What could be the reason or problem result in this endless
>>>> >> >> >> > splitting
>>>> >> >> >> > node
>>>> >> >> >> > situation.
>>>> >> >> >> > 2:Could it be the problem with the average modeling? Is there
>>>> >> >> >> > any
>>>> >> >> >> > option
>>>> >> >> >> > to
>>>> >> >> >> > enable average modeling? I trained the average model just as
>>>> >> >> >> > speaker
>>>> >> >> >> > dependent model with the same scripts, except the training
>>>> >> >> >> > data
>>>> >> >> >> > from
>>>> >> >> >> > different people.
>>>> >> >> >> >
>>>> >> >> >> > Thank you.
>>>> >> >> >> >
>>>> >> >> >> > Regards,
>>>> >> >> >> > Jay
>>>> >> >> >>
>>>> >> >> >
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >
>>>>
>>>
>

[hts-users:03193] Re: Problem of building regression tree (average voice model)