[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03193] Re: Problem of building regression tree (average voice model)


Hi,

Thank you. I tried to use HList to visualize the wave file, but its supported parameter kinds doesn't contain lf0, so I didn't use this way to find out the abnormal files. Alternatively I used HcompV to calculate the global mean and covariance for each wav file. I found that a few of them had high variance like 7.889393e+05. Though I not sure what made their variances so big, I removed the files contained variances higher then 1.000000e+00 from the training set. After removing those files, I used about 2000 sentences to train the average voice models and about 200 sentences to adapt the models. It seemed to be successful to building the regression tree without any error. 

But I encountered another problem. There were many explosive and corrupted sound in the adapted voice.
Here is a plot for the first sentence generated: http://i.imgur.com/XBCe3.jpg The voice above was generated from the 10 speakers average voice models, and the voice below was generated from the adapted voice. This generated voice was ok.
But in another sentence generated: http://i.imgur.com/jHec7.jpg Like the previous plot, the voice above was generated from the 10 speakers average voice models, and the voice below was generated from the adapted voice. As you can see some part of the adapted voice corrupted, and these part sounded like a noise. What could result in this problem? Was the building of regression tree unsuccessful?

Regards,
Jay

2012/2/28 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
Hi,

Please check stream[3] in your data before beginning the training.
For example, HList command can be used.

Regards,
Keiichiro Oura


2012/2/28 li jay <lij.acd@xxxxxxxxx>:
> Hi,
>
> Thank you for your help. I conducted some experiments. Originally I have
> speech data of 20 speakers. I built 20 models using sentences of only 1
> speaker (SD model).  I found that re_clustered.mmf of models of some
> speakers abnormal. In these files they were like:
>
>     745 <STREAM> 3
>     746 <NUMMIXES> 2
>     747 <MIXTURE> 1 9.436758e-01
>     748 <MEAN> 1
>     749  -9.250264e-03
>     750 <VARIANCE> 1
>     751  1.051578e+12
>
>     784 <STREAM> 3
>     785 <NUMMIXES> 2
>     786 <MIXTURE> 1 3.212932e-01
>     787 <MEAN> 1
>     788  -1.123511e-02
>     789 <VARIANCE> 1
>     790  1.051578e+12
>
>     979 <STREAM> 3
>     980 <NUMMIXES> 2
>     981 <MIXTURE> 1 8.911794e-01
>     982 <MEAN> 1
>     983  6.913134e-03
>     984 <VARIANCE> 1
>     985  1.051578e+12
>
> The variance of the stream 3 was too big, and ridiculously the variances
> were the same. Could it be a bug of HTS?
> Besides I tried to include these abnormal speakers' speech data into
> training data. Once I include the abnormal speakers' data into the training
> data, an error in building regression tree occurs. But without them I can
> build a regression tree successfully. I think it was that the abnormal data
> made the training of the models unsuccessful, which resulted in failure of
> building the regression tree. But there is something strange. I still can
> generate the voice using the models even if the variation of stream 3 is too
> big.
>
> Additionally, I plotted lf0 figures of those abnormal speakers, and they
> seemed to be nothing wrong with them, which looked like the lf0 figures of
> normal speakers. I am not sure which extracted lf0 files caused the
> abnormality because I used the same method and tools to extract them as I
> did on those normal files. Is it possible that some values are not supported
> by HTS because they are too big or small?
>
> Regards,
> Jay
>
> 2012/2/24 Heiga ZEN (Byung Ha CHUN) <heigazen@xxxxxxxxxx>
>
>> Hi,
>>
>> It is unlikely that having 20 speakers have caused this problem.  I expect
>> that some data files are corrupted, i. e., failed to extract speech features
>> from waveforms.  Please check whether your data is OK or not.
>>
>> Regards,
>>
>> Heiga
>>
>> 2012/02/24 9:23 "li jay" <lij.acd@xxxxxxxxx>:
>>
>>> Hi,
>>>
>>> Thank you for telling me this. Compared to the delta f0 of the speaker
>>> dependent model I trained previously, it is too big ( the current model I'm
>>> training is an average voice model ) . I thought it was because I used
>>> speech sentences of 20 speakers, which resulted in big delta f0. I used the
>>> same configuration and option settings of speaker dependent model training
>>> and replaced the training data with the data of 20 speakers. Is it correct
>>> to train a average voice model without modify any configuration and option?
>>>
>>> Regards,
>>> Jay
>>>
>>> 2012/2/24 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>>
>>>> Hi,
>>>>
>>>> It seems that delta lf0 is not trained correctly.
>>>>
>>>> <VARIANCE> 1
>>>> 2.078870e+12
>>>>
>>>> The variance is too big.
>>>> You should check delta f0 sequence in training data.
>>>>
>>>> Regards,
>>>> Keiichiro Oura
>>>>
>>>> 2012/2/24 li jay <lij.acd@xxxxxxxxx>:
>>>> > Hi,
>>>> >
>>>> > The following is the part of re_clustered.mmf
>>>> >
>>>> >  170535 ~p "lf0_s2_523-3"
>>>> >  170536 <STREAM> 3
>>>> >  170537 <NUMMIXES> 2
>>>> >  170538 <MIXTURE> 1 1.582146e-01
>>>> >  170539 <MEAN> 1
>>>> >  170540  -9.624681e-03
>>>> >  170541 <VARIANCE> 1
>>>> >  170542  2.078870e+12
>>>> >  170543 <GCONST> 3.020072e+01
>>>> >  170544 <MIXTURE> 2 8.417839e-01
>>>> >  170545 <MEAN> 0
>>>> >  170546 <VARIANCE> 0
>>>> >  170547 <GCONST> 0.000000e+00
>>>> >  170548 ~p "lf0_s2_523-4"
>>>> >  170549 <STREAM> 4
>>>> >  170550 <NUMMIXES> 2
>>>> >  170551 <MIXTURE> 1 1.582144e-01
>>>> >  170552 <MEAN> 1
>>>> >  170553  -1.753086e-03
>>>> >  170554 <VARIANCE> 1
>>>> >  170555  8.353170e-04
>>>> >  170556 <GCONST> -5.249823e+00
>>>> >  170557 <MIXTURE> 2 8.417841e-01
>>>> >  ...
>>>> >
>>>> >  260387 ~h
>>>> >
>>>> > "CH_dz`-CH_U+sp/T:x_4_4_x_4/WS:1_6_6/CS:2_7_8/CW:2_1_2/PS:5_19_23/PW:5_1_5/PC:2_1_2"
>>>> >  260388 <BEGINHMM>
>>>> >  260389 <NUMSTATES> 7
>>>> >  260390 <STATE> 2
>>>> >  260391 <STREAM> 1
>>>> >  260392 ~p "mgc_s2_21"
>>>> >  260393 <STREAM> 2
>>>> >  260394 ~p "lf0_s2_523-2"
>>>> >  260395 <STREAM> 3
>>>> >  260396 ~p "lf0_s2_523-3"
>>>> >  260397 <STREAM> 4
>>>> >  260398 ~p "lf0_s2_523-4"
>>>> >  260399 <STATE> 3
>>>> >  260400 <STREAM> 1
>>>> >  ...
>>>> >
>>>> > There seem to be no error or in the stream[3]. What could affect the
>>>> > building process of regression tree?
>>>> >
>>>> > Regards,
>>>> > Jay
>>>> >
>>>> > 2012/2/24 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>> >>
>>>> >> Hi,
>>>> >>
>>>> >> The distributions are in .../cmp/re_clustered.mmf
>>>> >>
>>>> >> Regards,
>>>> >> Keiichiro Oura
>>>> >>
>>>> >>
>>>> >> 2012/2/24 li jay <lij.acd@xxxxxxxxx>:
>>>> >> > Hi,
>>>> >> >
>>>> >> > The following is part of the log file when I tried to build the
>>>> >> > regression
>>>> >> > tree:
>>>> >> >
>>>> >> > Splitting Node 32763, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node 32765, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node 32767, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node -32767, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node -32765, score 9.997541e+09
>>>> >> > (Stream=3, vSize=1)
>>>> >> > Splitting Node -32763, score 9.997541e+09
>>>> >> >
>>>> >> > The reason why the index went to negative value seemed to be an
>>>> >> > overflow occurred.
>>>> >> > Could you tell me in which file I can check the distribution of
>>>> >> > stream[3]?
>>>> >> > Thank you for you help.
>>>> >> >
>>>> >> > Regards,
>>>> >> > Jay
>>>> >> >
>>>> >> > 2012/2/23 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>> >> >>
>>>> >> >> Hi,
>>>> >> >>
>>>> >> >> This value is node index in "reg.tree".
>>>> >> >>
>>>> >> >>  printf("Splitting Node %d, score %e\n", r->nodeIndex, score);
>>>> >> >>
>>>> >> >> Index is always positive value.
>>>> >> >> I don't know why split is in a loop...
>>>> >> >> Could you check the distribution of stream[3]?
>>>> >> >>
>>>> >> >> Regards,
>>>> >> >> Keiichiro Oura
>>>> >> >>
>>>> >> >>
>>>> >> >> 2012/2/23 li jay <lij.acd@xxxxxxxxx>:
>>>> >> >> > Thank you for your reply.
>>>> >> >> >
>>>> >> >> > I've tried HTS2-2, replacing the HTS commands with the ones of
>>>> >> >> > HTS-2.2.
>>>> >> >> > The
>>>> >> >> > result was exactly the same as the one of HTS-2.1.1.  The score
>>>> >> >> > stayed
>>>> >> >> > the
>>>> >> >> > same, and the node split endlessly.
>>>> >> >> >
>>>> >> >> > What do you mean by 'Splitting Node' is negative value? You mean
>>>> >> >> > the
>>>> >> >> > score?
>>>> >> >> >
>>>> >> >> > Regards,
>>>> >> >> > Jay
>>>> >> >> >
>>>> >> >> > 2012/2/22 Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
>>>> >> >> >>
>>>> >> >> >> Hi,
>>>> >> >> >>
>>>> >> >> >> Could you try HTS-2.2?
>>>> >> >> >> I don't know why 'Splitting Node' is negative value.
>>>> >> >> >>
>>>> >> >> >> Regards,
>>>> >> >> >> Keiichiro Oura
>>>> >> >> >>
>>>> >> >> >> 2012/2/22 li jay <lij.acd@xxxxxxxxx>:
>>>> >> >> >> > Hi,
>>>> >> >> >> >
>>>> >> >> >> > I've been trying to build a regression tree for speaker
>>>> >> >> >> > adaptation. I
>>>> >> >> >> > am
>>>> >> >> >> > using HTS 2.1.1. I've trained a average voice model from 4000
>>>> >> >> >> > sentences
>>>> >> >> >> > (about 2.5 hrs) of 20 speakers. It was successful to generate
>>>> >> >> >> > voice
>>>> >> >> >> > using
>>>> >> >> >> > the average voice model. I wanted to apply speaker adaptation
>>>> >> >> >> > on
>>>> >> >> >> > this
>>>> >> >> >> > average voice model, so I tried to build a regression tree
>>>> >> >> >> > with
>>>> >> >> >> > the
>>>> >> >> >> > command
>>>> >> >> >> > below:
>>>> >> >> >> > /usr/local/HTS-2.1.1/bin/HHEd -A -B -C
>>>> >> >> >> > /home/jay/TTS/try/AST_female_20_speakers_2/configs/trn.cnf -D
>>>> >> >> >> > -T 1
>>>> >> >> >> > -p
>>>> >> >> >> > -i
>>>> >> >> >> > -H
>>>> >> >> >> > /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speakers_2/models/cmp/re_clust
>>>> >> >> >> > ered.mmf -M /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speakers_2/models/cmp/regTrees
>>>> >> >> >> > /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speakers_2/edfiles/cmp/reg.hed
>>>> >> >> >> > /home/
>>>> >> >> >> > jay /TTS/try/AST_female_20_speaker
>>>> >> >> >> > s_2/data/lists/full.list
>>>> >> >> >> >
>>>> >> >> >> > The problem was that splitting of nodes did finish. It seemed
>>>> >> >> >> > to
>>>> >> >> >> > be
>>>> >> >> >> > in a
>>>> >> >> >> > loop, and the score stayed the same. So the HHEd command
>>>> >> >> >> > cannot
>>>> >> >> >> > stop.  The
>>>> >> >> >> > log file shows as below:
>>>> >> >> >> >
>>>> >> >> >> > HTK Configuration Parameters[10]
>>>> >> >> >> >   Module/Tool     Parameter                  Value
>>>> >> >> >> > #                 MINDUR                         5
>>>> >> >> >> > #                 MAXSTDDEVCOEF                 10
>>>> >> >> >> > #                 APPLYDURVARFLOOR              TRUE
>>>> >> >> >> > #                 DURVARFLOORPERCENTILE          1.000000
>>>> >> >> >> > #                 SHRINKOCCTHRESH  Vector 4 500.0 100.0 100.0
>>>> >> >> >> > 100.0
>>>> >> >> >> > #                 VFLOORSCALESTR  Vector 4 0.01 0.01 0.01
>>>> >> >> >> > 0.01
>>>> >> >> >> > #                 MINLEAFOCC                     0
>>>> >> >> >> > #                 NATURALWRITEORDER              TRUE
>>>> >> >> >> > #                 NATURALREADORDER              TRUE
>>>> >> >> >> > #                 APPLYVFLOOR                 TRUE
>>>> >> >> >> >
>>>> >> >> >> > // construct regression class tree
>>>> >> >> >> > RC 32 reg
>>>> >> >> >> >  Building regression tree with 32 terminals (4 streams)
>>>> >> >> >> > Creating regression class tree with ident reg.tree and
>>>> >> >> >> > baseclass
>>>> >> >> >> > reg.base
>>>> >> >> >> > Splitting Node 1, score 1.000000e+10
>>>> >> >> >> > (Stream splitting)
>>>> >> >> >> > Splitting Node 3, score 1.000000e+10
>>>> >> >> >> > (Stream splitting)
>>>> >> >> >> > Splitting Node 5, score 1.000000e+10
>>>> >> >> >> > (Stream splitting)
>>>> >> >> >> > Splitting Node 7, score 1.000000e+10
>>>> >> >> >> > (MSD splitting)
>>>> >> >> >> > Splitting Node 6, score 1.000000e+10
>>>> >> >> >> > (MSD splitting)
>>>> >> >> >> > Splitting Node 10, score 8.998759e+10
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 13, score 2.999760e+10
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 4, score 1.000000e+10
>>>> >> >> >> > (MSD splitting)
>>>> >> >> >> > Splitting Node 15, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 19, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node 21, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > ...
>>>> >> >> >> > ...
>>>> >> >> >> > ...
>>>> >> >> >> > Splitting Node -16495, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node -16493, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> > Splitting Node -16491, score 9.997541e+09
>>>> >> >> >> > (Stream=3, vSize=1)
>>>> >> >> >> >
>>>> >> >> >> > Could you do me a favor to help the problem? My questions
>>>> >> >> >> > are:
>>>> >> >> >> > 1: What could be the reason or problem result in this endless
>>>> >> >> >> > splitting
>>>> >> >> >> > node
>>>> >> >> >> > situation.
>>>> >> >> >> > 2:Could it be the problem with the average modeling? Is there
>>>> >> >> >> > any
>>>> >> >> >> > option
>>>> >> >> >> > to
>>>> >> >> >> > enable average modeling? I trained the average model just as
>>>> >> >> >> > speaker
>>>> >> >> >> > dependent model with the same scripts, except the training
>>>> >> >> >> > data
>>>> >> >> >> > from
>>>> >> >> >> > different people.
>>>> >> >> >> >
>>>> >> >> >> > Thank you.
>>>> >> >> >> >
>>>> >> >> >> > Regards,
>>>> >> >> >> > Jay
>>>> >> >> >>
>>>> >> >> >
>>>> >> >>
>>>> >> >
>>>> >>
>>>> >
>>>>
>>>
>



References
[hts-users:03173] Problem of building regression tree (average voice model), li jay
[hts-users:03174] Re: Problem of building regression tree (average voice model), Keiichiro Oura
[hts-users:03176] Re: Problem of building regression tree (average voice model), li jay
[hts-users:03177] Re: Problem of building regression tree (average voice model), Keiichiro Oura
[hts-users:03179] Re: Problem of building regression tree (average voice model), li jay
[hts-users:03180] Re: Problem of building regression tree (average voice model), Keiichiro Oura
[hts-users:03181] Re: Problem of building regression tree (average voice model), li jay
[hts-users:03182] Re: Problem of building regression tree (average voice model), Keiichiro Oura
[hts-users:03183] Re: Problem of building regression tree (average voice model), li jay
[hts-users:03184] Re: Problem of building regression tree (average voice model), Heiga ZEN (Byung Ha CHUN)
[hts-users:03186] Re: Problem of building regression tree (average voice model), li jay
[hts-users:03187] Re: Problem of building regression tree (average voice model), Keiichiro Oura