[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03187] Re: Problem of building regression tree (average voice model)


Hi,

Please check stream[3] in your data before beginning the training.
For example, HList command can be used.

Regards,
Keiichiro Oura


2012/2/28 li jay <lij.acd@xxxxxxxxx>:
> Hi,
>
> Thank you for your help. I conducted some experiments. Originally I have
> speech data of 20 speakers. I built 20 models using sentences of only 1
> speaker (SD model).  I found that re_clustered.mmf of models of some
> speakers abnormal. In these files they were like:
>
>     745 <STREAM> 3
>     746 <NUMMIXES> 2
>     747 <MIXTURE> 1 9.436758e-01
>     748 <MEAN> 1
>     749  -9.250264e-03
>     750 <VARIANCE> 1
>     751  1.051578e+12
>
>     784 <STREAM> 3
>     785 <NUMMIXES> 2
>     786 <MIXTURE> 1 3.212932e-01
>     787 <MEAN> 1
>     788  -1.123511e-02
>     789 <VARIANCE> 1
>     790  1.051578e+12
>
>     979 <STREAM> 3
>     980 <NUMMIXES> 2
>     981 <MIXTURE> 1 8.911794e-01
>     982 <MEAN> 1
>     983  6.913134e-03
>     984 <VARIANCE> 1
>     985  1.051578e+12
>
> The variance of the stream 3 was too big, and ridiculously the variances
> were the same. Could it be a bug of HTS?
> Besides I tried to include these abnormal speakers' speech data into
> training data. Once I include the abnormal speakers' data into the training
> data, an error in building regression tree occurs. But without them I can
> build a regression tree successfully. I think it was that the abnormal data
> made the training of the models unsuccessful, which resulted in failure of
> building the regression tree. But there is something strange. I still can
> generate the voice using the models even if the variation of stream 3 is too
> big.
>
> Additionally, I plotted lf0 figures of those abnormal speakers, and they
> seemed to be nothing wrong with them, which looked like the lf0 figures of
> normal speakers. I am not sure which extracted lf0 files caused the
> abnormality because I used the same method and tools to extract them as I
> did on those normal files. Is it possible that some values are not supported
> by HTS because they are too big or small?
>
> Regards,
> Jay
>
> 2012/2/24 Heiga ZEN (Byung Ha CHUN) <heigazen@xxxxxxxxxx>
>
>> Hi,
>>
>> It is unlikely that having 20 speakers have caused this problem.  I expect
>> that some data files are corrupted, i. e., failed to extract speech features
>> from waveforms.  Please check whether your data is OK or not.
>>
>> Regards,
>>
>> Heiga
>>
>> 2012/02/24 9:23 "li jay" <lij.acd@xxxxxxxxx>:
>>
>>> Hi,
>>>
>>> Thank you for telling me this. Compared to the delta f0 of the speaker
>>> dependent model I trained previously, it is too big ( the current model I'm
>>> training is an average voice model ) . I thought it was because I used
>>> speech sentences of 20 speakers, which resulted in big delta f0. I used the
>>> same configuration and option settings of speaker dependent model training
>>> and replaced the training data with the data of 20 speakers. Is it correct
>>> to train a average voice model without modify any configuration and option?
>>>
>>> Regards,
>>> Jay
>>>
>>> 2012/2/24 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>>
>>>> Hi,
>>>>
>>>> It seems that delta lf0 is not trained correctly.
>>>>
>>>> <VARIANCE> 1
>>>> 2.078870e+12
>>>>
>>>> The variance is too big.
>>>> You should check delta f0 sequence in training data.
>>>>
>>>> Regards,
>>>> Keiichiro Oura
>>>>
>>>> 2012/2/24 li jay <lij.acd@xxxxxxxxx>:
>>>> > Hi,
>>>> >
>>>> > The following is the part of re_clustered.mmf
>>>> >
>>>> >  170535 ~p "lf0_s2_523-3"
>>>> >  170536 <STREAM> 3
>>>> >  170537 <NUMMIXES> 2
>>>> >  170538 <MIXTURE> 1 1.582146e-01
>>>> >  170539 <MEAN> 1
>>>> >  170540  -9.624681e-03
>>>> >  170541 <VARIANCE> 1
>>>> >  170542  2.078870e+12
>>>> >  170543 <GCONST> 3.020072e+01
>>>> >  170544 <MIXTURE> 2 8.417839e-01
>>>> >  170545 <MEAN> 0
>>>> >  170546 <VARIANCE> 0
>>>> >  170547 <GCONST> 0.000000e+00
>>>> >  170548 ~p "lf0_s2_523-4"
>>>> >  170549 <STREAM> 4
>>>> >  170550 <NUMMIXES> 2
>>>> >  170551 <MIXTURE> 1 1.582144e-01
>>>> >  170552 <MEAN> 1
>>>> >  170553  -1.753086e-03
>>>> >  170554 <VARIANCE> 1
>>>> >  170555  8.353170e-04
>>>> >  170556 <GCONST> -5.249823e+00
>>>> >  170557 <MIXTURE> 2 8.417841e-01
>>>> >  ...
>>>> >
>>>> >  260387 ~h
>>>> >
>>>> > "CH_dz`-CH_U+sp/T:x_4_4_x_4/WS:1_6_6/CS:2_7_8/CW:2_1_2/PS:5_19_23/PW:5_1_5/PC:2_1_2"
>>>> >  260388 <BEGINHMM>
>>>> >  260389 <NUMSTATES> 7
>>>> >  260390 <STATE> 2
>>>> >  260391 <STREAM> 1
>>>> >  260392 ~p "mgc_s2_21"
>>>> >  260393 <STREAM> 2
>>>> >  260394 ~p "lf0_s2_523-2"
>>>> >  260395 <STREAM> 3
>>>> >  260396 ~p "lf0_s2_523-3"
>>>> >  260397 <STREAM> 4
>>>> >  260398 ~p "lf0_s2_523-4"
>>>> >  260399 <STATE> 3
>>>> >  260400 <STREAM> 1
>>>> >  ...
>>>> >
>>>> > There seem to be no error or in the stream[3]. What could affect the
>>>> > building process of regression tree?
>>>> >
>>>> > Regards,
>>>> > Jay
>>>> >
>>>> > 2012/2/24 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> The distributions are in .../cmp/re_clustered.mmf
>>>> >>
>>>> >> Regards,
>>>> >> Keiichiro Oura
>>>> >>
>>>> >>
>>>> >> 2012/2/24 li jay <lij.acd@xxxxxxxxx>:
>>>> >> > Hi,
>>>> >> >
>>>> >> > The following is part of the log file when I tried to build the
>>>> >> > regression
>>>> >> > tree:
>>>> >> >
>>>> >> > Splitting Node 32763, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node 32765, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node 32767, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node -32767, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node -32765, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node -32763, score 9.997541e+09
>>>> >> >
>>>> >> > The reason why the index went to negative value seemed to be an
>>>> >> > overflow occurred.
>>>> >> > Could you tell me in which file I can check the distribution of
>>>> >> > stream[3]?
>>>> >> > Thank you for you help.
>>>> >> >
>>>> >> > Regards,
>>>> >> > Jay
>>>> >> >
>>>> >> > 2012/2/23 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>> >> >>
>>>> >> >> Hi,
>>>> >> >>
>>>> >> >> This value is node index in "reg.tree".
>>>> >> >>
>>>> >> >>  printf("Splitting Node %d, score %e\n", r->nodeIndex, score);
>>>> >> >>
>>>> >> >> Index is always positive value.
>>>> >> >> I don't know why split is in a loop...
>>>> >> >> Could you check the distribution of stream[3]?
>>>> >> >>
>>>> >> >> Regards,
>>>> >> >> Keiichiro Oura
>>>> >> >>
>>>> >> >>
>>>> >> >> 2012/2/23 li jay <lij.acd@xxxxxxxxx>:
>>>> >> >> > Thank you for your reply.
>>>> >> >> >
>>>> >> >> > I've tried HTS2-2, replacing the HTS commands with the ones of
>>>> >> >> > HTS-2.2.
>>>> >> >> > The
>>>> >> >> > result was exactly the same as the one of HTS-2.1.1.  The score
>>>> >> >> > stayed
>>>> >> >> > the
>>>> >> >> > same, and the node split endlessly.
>>>> >> >> >
>>>> >> >> > What do you mean by 'Splitting Node' is negative value? You mean
>>>> >> >> > the
>>>> >> >> > score?
>>>> >> >> >
>>>> >> >> > Regards,
>>>> >> >> > Jay
>>>> >> >> >
>>>> >> >> > 2012/2/22 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>> >> >> >>
>>>> >> >> >> Hi,
>>>> >> >> >>
>>>> >> >> >> Could you try HTS-2.2?
>>>> >> >> >> I don't know why 'Splitting Node' is negative value.
>>>> >> >> >>
>>>> >> >> >> Regards,
>>>> >> >> >> Keiichiro Oura
>>>> >> >> >>
>>>> >> >> >> 2012/2/22 li jay <lij.acd@xxxxxxxxx>:
>>>> >> >> >> > Hi,
>>>> >> >> >> >
>>>> >> >> >> > I've been trying to build a regression tree for speaker
>>>> >> >> >> > adaptation. I
>>>> >> >> >> > am
>>>> >> >> >> > using HTS 2.1.1. I've trained a average voice model from 4000
>>>> >> >> >> > sentences
>>>> >> >> >> > (about 2.5 hrs) of 20 speakers. It was successful to generate
>>>> >> >> >> > voice
>>>> >> >> >> > using
>>>> >> >> >> > the average voice model. I wanted to apply speaker adaptation
>>>> >> >> >> > on
>>>> >> >> >> > this
>>>> >> >> >> > average voice model, so I tried to build a regression tree
>>>> >> >> >> > with
>>>> >> >> >> > the
>>>> >> >> >> > command
>>>> >> >> >> > below:
>>>> >> >> >> > /usr/local/HTS-2.1.1/bin/HHEd -A -B -C
>>>> >> >> >> > /home/jay/TTS/try/AST_female_20_speakers_2/configs/trn.cnf -D
>>>> >> >> >> > -T 1
>>>> >> >> >> > -p
>>>> >> >> >> > -i
>>>> >> >> >> > -H
>>>> >> >> >> > /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speakers_2/models/cmp/re_clust
>>>> >> >> >> > ered.mmf -M /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speakers_2/models/cmp/regTrees
>>>> >> >> >> > /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speakers_2/edfiles/cmp/reg.hed
>>>> >> >> >> > /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speaker
>>>> >> >> >> > s_2/data/lists/full.list
>>>> >> >> >> >
>>>> >> >> >> > The problem was that splitting of nodes did finish. It seemed
>>>> >> >> >> > to
>>>> >> >> >> > be
>>>> >> >> >> > in a
>>>> >> >> >> > loop, and the score stayed the same. So the HHEd command
>>>> >> >> >> > cannot
>>>> >> >> >> > stop.  The
>>>> >> >> >> > log file shows as below:
>>>> >> >> >> >
>>>> >> >> >> > HTK Configuration Parameters[10]
>>>> >> >> >> >   Module/Tool     Parameter                  Value
>>>> >> >> >> > #                 MINDUR                         5
>>>> >> >> >> > #                 MAXSTDDEVCOEF                 10
>>>> >> >> >> > #                 APPLYDURVARFLOOR              TRUE
>>>> >> >> >> > #                 DURVARFLOORPERCENTILE          1.000000
>>>> >> >> >> > #                 SHRINKOCCTHRESH  Vector 4 500.0 100.0 100.0
>>>> >> >> >> > 100.0
>>>> >> >> >> > #                 VFLOORSCALESTR  Vector 4 0.01 0.01 0.01
>>>> >> >> >> > 0.01
>>>> >> >> >> > #                 MINLEAFOCC                     0
>>>> >> >> >> > #                 NATURALWRITEORDER              TRUE
>>>> >> >> >> > #                 NATURALREADORDER              TRUE
>>>> >> >> >> > #                 APPLYVFLOOR                 TRUE
>>>> >> >> >> >
>>>> >> >> >> > // construct regression class tree
>>>> >> >> >> > RC 32 reg
>>>> >> >> >> >  Building regression tree with 32 terminals (4 streams)
>>>> >> >> >> > Creating regression class tree with ident reg.tree and
>>>> >> >> >> > baseclass
>>>> >> >> >> > reg.base
>>>> >> >> >> > Splitting Node 1, score 1.000000e+10
>>>> >> >> >> > (Stream splitting)
>>>> >> >> >> > Splitting Node 3, score 1.000000e+10
>>>> >> >> >> > (Stream splitting)
>>>> >> >> >> > Splitting Node 5, score 1.000000e+10
>>>> >> >> >> > (Stream splitting)
>>>> >> >> >> > Splitting Node 7, score 1.000000e+10
>>>> >> >> >> > (MSD splitting)
>>>> >> >> >> > Splitting Node 6, score 1.000000e+10
>>>> >> >> >> > (MSD splitting)
>>>> >> >> >> > Splitting Node 10, score 8.998759e+10
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 13, score 2.999760e+10
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 4, score 1.000000e+10
>>>> >> >> >> > (MSD splitting)
>>>> >> >> >> > Splitting Node 15, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 19, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 21, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > ...
>>>> >> >> >> > ...
>>>> >> >> >> > ...
>>>> >> >> >> > Splitting Node -16495, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node -16493, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node -16491, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> >
>>>> >> >> >> > Could you do me a favor to help the problem? My questions
>>>> >> >> >> > are:
>>>> >> >> >> > 1: What could be the reason or problem result in this endless
>>>> >> >> >> > splitting
>>>> >> >> >> > node
>>>> >> >> >> > situation.
>>>> >> >> >> > 2:Could it be the problem with the average modeling? Is there
>>>> >> >> >> > any
>>>> >> >> >> > option
>>>> >> >> >> > to
>>>> >> >> >> > enable average modeling? I trained the average model just as
>>>> >> >> >> > speaker
>>>> >> >> >> > dependent model with the same scripts, except the training
>>>> >> >> >> > data
>>>> >> >> >> > from
>>>> >> >> >> > different people.
>>>> >> >> >> >
>>>> >> >> >> > Thank you.
>>>> >> >> >> >
>>>> >> >> >> > Regards,
>>>> >> >> >> > Jay
>>>> >> >> >>
>>>> >> >> >
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >
>>>>
>>>
>

Follow-Ups
[hts-users:03193] Re: Problem of building regression tree (average voice model), li jay
References
[hts-users:03173] Problem of building regression tree (average voice model), li jay
[hts-users:03174] Re: Problem of building regression tree (average voice model), Keiichiro Oura
[hts-users:03176] Re: Problem of building regression tree (average voice model), li jay
[hts-users:03177] Re: Problem of building regression tree (average voice model), Keiichiro Oura
[hts-users:03179] Re: Problem of building regression tree (average voice model), li jay
[hts-users:03180] Re: Problem of building regression tree (average voice model), Keiichiro Oura
[hts-users:03181] Re: Problem of building regression tree (average voice model), li jay
[hts-users:03182] Re: Problem of building regression tree (average voice model), Keiichiro Oura
[hts-users:03183] Re: Problem of building regression tree (average voice model), li jay
[hts-users:03184] Re: Problem of building regression tree (average voice model), Heiga ZEN (Byung Ha CHUN)
[hts-users:03186] Re: Problem of building regression tree (average voice model), li jay