[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00065] Re: HTS voice output problems


Hi Heika,

Thanks for your answers.


> Nicholas Volk wrote:
>
>> I put a sample file in http://www.bitlips.fi/HTS/ms_272_16.raw
>> (sample rate 16000, Lin16, Mono, Little Endian).
>> As you can see (probably not hear ;) there are some pitch period
>> like areas in the file. (The reason they have nothing above 5500KHz
>> is that I upsamples my training data from 11K to 16K.)
>> Most of the file is however problematic: very low noise and clicks,
>> no speech. Any ideas what causes this would be appreciated.
>
> I can't hear this file.

I did `sox -r 16000 -w -s ms_272_16.raw ms_272_16.raw` to
give it a proper header. There's something that might be a vowel
just before 2 secs...
The new file is located at http://www.bitlips.fi/HTS/ms_272_16.wav .


> I think you can try to train HMM without upsampling.
> By modifying some part of HTS-demo/scripts/Training.in and
> HTS-demo_.../scripts/mkdata.in,
> you can run the training script on 55000 Hz sampling.
> If you want to use hts_engine for synthesis, defaults.h should also be
> modified.

The reason for the upsampling was my laziness. I though it would
be safer and faster to use the defaults...

>> I used only 90 sentences as the training material while experimenting
>> (due to speed). Is the small training data the reason for this?
>
> I don't think so.
> Some of my colleagues trained HMMs with less than 50 utterances.
> Synthesized speech from these HMMs were intelligible.
> (Actually, speaker of this experiment was very good.)
>

There's something I must be doing wrong... I'll have to dig deeper.

I blindly copied the four win/*.win files for AWB.
Could this be a problem?

I've made some tiny modifications to HTS scripts since I'm running this
on Cygwin. In some cases the some/path/* grows too large and I've modified
the scripts a bit to fix this problems.


Since I have a background in Festival and know very little about
HTK, I'd like to check one thing:
The question files contain a bunch of yes/no-questions.
Each question contains the set of regexp-like possible yes-answers.
Are the label files are only used together with the questions file?
Are all the questions used for all the stages of training or does
some of these, say f0 modelling, use only a subset of these?
(I guess that all are used and that the /A:, /B:, /C: et cetera
are there for human-readability and not for magic.)


My utterance structure is different different from AWB-english.
I don't have phrase relation and I most certainly don't use TOBI
intonation model for non-English (though admittedly TOBI has been applied
with success to languages other than English).
I cheated all the TOBI-related features to return '0' among other things...
It might be that I fail to create any sensible F0.

best regards,
  Nicholas





Follow-Ups
[hts-users:00066] Re: HTS voice output problems, Heiga Zen (Byung-Ha Chun)
References
[hts-users:00048] Problem in Training_foo_bar.pl, Nicholas Volk
[hts-users:00050] Re: Problem in Training_foo_bar.pl, Heiga Zen/Byung-Ha Chun
[hts-users:00063] HTS voice output problems, Nicholas Volk
[hts-users:00064] Re: HTS voice output problems, Heiga Zen (Byung-Ha Chun)