[hts-users:03851] Re: Quality from demos

2013/8/19 Dietmar Schabus <schabus@xxxxxx>

How do you make .raw audio files from wave files? You need to convert to raw files at 48 kHz, 16 bit (2-byte short), signed integer, little endian, single channel.
Like this (bash syntax):

for wavfile in /your/wav/dir/*.wav
do
sox "$wavfile" -V1 -t raw -r 48000 -b 16 -e signed-integer -L -c1 "/someplace/HTS-demo_CMU-ARCTIC-SLT/data/raw/${wavfile%.wav}.raw"
done

See man sox for details.

Your file sounds like your .raws are at 16 kHz. You can also use 16 kHz data, but then you need to specify that in the SAMPFREQ environment variable before running configure. Run ./configure --help for more info.

Best, Dietmar

On 2013-08-10 19:45, Marvin Coto wrote:

Hello Dietmar!
Thank you so much for your reply.
I recently changed the questions folder to fit my phone set, and the
result have improved a lot:

https://dl.dropboxusercontent.com/u/81143637/nueva.wav

The problem now I think is the pitch in the synthetized wave. I've tried
with male and female voices, and in both cases the results are heard in
lower tones than the original voice. I ran the demo script
cmu_us_arctic_slt as it comes, and the result was perfect.

I also read on the log file: "x2x : warning: input data is over the
range of type 'short'!" I read about it on the mailing list but didn't
understand if I have to change something on the scripts.
Thanks in advance for your attention,

Marvin.

2013/8/7 Dietmar Schabus <schabus@xxxxxx <mailto:schabus@xxxxxx>>

Hello Marvin,

I think it's due to the questions.

If the questions do not match your phone set, the decision tree
based clustering cannot produce reasonable results (everything will
be answered "no").
I think the easiest way to obtain questions is to write a simple
script that generates them from a set of classes (like "vowel" or
"fricative" etc.) where the phones (of your phone set) that belong
to each class are listed.

Regards,
Dietmar

On 2013-08-06 22:06, Marvin Coto wrote:

Hello!

I'm trying to use the demo scripts of HTS 2.2: the Speaker-dependent
training demo in English and Portuguese, for a voice in Spanish.
I change the raw files from my spanish database (184 files made from
good quality recordings), and the utt files (generated with Festival
2.1, from the text transcription of the audio files). Even though I
haven't change the Questions folder, I was expected a little more
quality from the beggining.
A example of the result, synthetized in Festival can be heard here:

https://dl.dropboxusercontent.__com/u/81143637/ConEstoico.wav

<https://dl.dropboxusercontent.com/u/81143637/ConEstoico.wav>

It supposed to say "Con estoico respeto a la justicia adyacente
guardo
sus flechas". I'm sure nobody needs to speak spanish to realize
something is going very wrong. From that audio file, does
somebody have
any idea of where could be the problem? Maybe the raw files, the
way I'm
generating the utt, or should be the questions folder?

In the last case, do I have to change the questions manually, or
that
have to be done in HTK?

Thanks in advance,

Marvin.

--
----------------------------------------------------------------------
Dipl.-Ing. Dietmar Schabus | Researcher

phone +43 1 5052830-48 | fax -99 | schabus@xxxxxx | http://userver.ftw.at/~schabus/

FTW Telecommunications Research Center Vienna
Donau-City-Straße 1/3 | 1220 Vienna | Austria | www.ftw.at