[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04334] Re: bad voice output for test sentences


Thanks very much for the advice.  It is true that the original data was 16kHz and then up-sampled to use with the demo.

I tried the second suggestion, adding the -E -60.0 option, but the results were the same.

I looked at the 'alice' output utterances more closely, and discovered that many of those also came out silent.  I did not notice this originally because I had only listened to the first few.  I noticed that the utterances which came out silent tended to be the shorter ones, and our test utterances were also somewhat short.  I was able to get our test utterances synthesized in the end by combining them into fewer, longer utterances, re-making the label files, and synthesizing from those.

We are happy that we were able to synthesize our test utterances in the end, but we still don't know why shorter utterances might cause a problem - do you have any insight into why this might be?

Thanks,
Erica

On Wed, Nov 18, 2015 at 8:10 PM, Keiichiro Oura <uratec@xxxxxxxxxxxx> wrote:
Hi,

I suppose that training data were recorded as 16kHz sampling rate.
And up-sampled data were used as training data.
It seems that there are two solutions.

1.
Run training scripts with 16kHz setting and 16kHz data.

2.
Add -E -60.0 to mcep/mgc command in data/Makefile and run training
scripts with 48kHz setting and up-sampled data.

Regards,
Keiichiro Oura




2015-11-19 4:51 GMT+09:00 Erica Cooper <ecooper@xxxxxxxxxxxxxxx>:
> Hi,
>
> I checked and I had mistakenly synthesized that test audio from the mono
> labels rather than full.
> Using hts-engine to synthesize from the full labels still produces bad
> output though, it's mostly silence:
> http://www.cs.columbia.edu/~ecooper/audio/nat_0001-2.wav
> Here is the fullcontext label that was used:
> http://www.cs.columbia.edu/~ecooper/audio/nat_0001.lab
> I don't know whether it makes a difference, but the training data used for
> this voice has been 'monotonized' (constant lf0 for voiced regions).  We
> have already done this for another voice trained the same way on different
> data and all synthesized test utterances came out properly.
>
> Thanks,
> Erica
>
>
> On Wed, Nov 18, 2015 at 10:09 AM, Keiichiro Oura <uratec@xxxxxxxxxxxxxxx>
> wrote:
>>
>> Hi,
>>
>> Let me see the test labels.
>>
>> Regards,
>> Keiichiro Oura
>>
>>
>>
>> 2015-11-18 23:43 GMT+09:00 Erica Cooper <ecooper@xxxxxxxxxxxxxxx>:
>> > Hi all,
>> >
>> > I have trained a voice using the HTS demo script and my own data.  I
>> > have
>> > also added my own test sentences to gen.scp.  The 'alice' sentences come
>> > out
>> > fine, however my own test sentences come out with no audible speech,
>> > they
>> > sound like this (hts_engine):
>> >
>> > http://www.cs.columbia.edu/~ecooper/audio/nat_0001.wav
>> >
>> > I would expect that there may be something wrong with my test sentence
>> > labels, however I have already been using them with dozens of other
>> > voices
>> > with no problems.  If anyone has any ideas or pointers as to what might
>> > be
>> > causing this and how to fix it, it would be greatly appreciated.
>> >
>> > Thanks,
>> > Erica
>>
>>
>




Follow-Ups
[hts-users:04335] Re: bad voice output for test sentences, Simon King
References
[hts-users:04325] bad voice output for test sentences, Erica Cooper
[hts-users:04326] Re: bad voice output for test sentences, Keiichiro Oura
[hts-users:04327] Re: bad voice output for test sentences, Erica Cooper
[hts-users:04329] Re: bad voice output for test sentences, Keiichiro Oura