[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03851] Re: Quality from demos


It works! Also I adjusted the values on Festival, to match the bits and bitrate.
Thank you so much,
Marvin.




2013/8/19 Dietmar Schabus <schabus@xxxxxx>
How do you make .raw audio files from wave files? You need to convert to raw files at 48 kHz, 16 bit (2-byte short), signed integer, little endian, single channel.
Like this (bash syntax):

for wavfile in /your/wav/dir/*.wav
do
  sox "$wavfile" -V1 -t raw -r 48000 -b 16 -e signed-integer -L -c1 "/someplace/HTS-demo_CMU-ARCTIC-SLT/data/raw/${wavfile%.wav}.raw"
done

See man sox for details.

Your file sounds like your .raws are at 16 kHz. You can also use 16 kHz data, but then you need to specify that in the SAMPFREQ environment variable before running configure. Run ./configure --help for more info.

Best, Dietmar



On 2013-08-10 19:45, Marvin Coto wrote:
Hello Dietmar!
Thank you so much for your reply.
I recently changed the questions folder to fit my phone set, and the
result have improved a lot:

https://dl.dropboxusercontent.com/u/81143637/nueva.wav

The problem now I think is the pitch in the synthetized wave. I've tried
with male and female voices, and in both cases the results are heard in
lower tones than the original voice. I ran the demo script
cmu_us_arctic_slt as it comes, and the result was perfect.

I also read on the log file: "x2x : warning: input data is over the
range of type 'short'!" I read about it on the mailing list but didn't
understand if I have to change something on the scripts.
Thanks in advance for your attention,

Marvin.


2013/8/7 Dietmar Schabus <schabus@xxxxxx <mailto:schabus@xxxxxx>>


    Hello Marvin,

    I think it's due to the questions.

    If the questions do not match your phone set, the decision tree
    based clustering cannot produce reasonable results (everything will
    be answered "no").
    I think the easiest way to obtain questions is to write a simple
    script that generates them from a set of classes (like "vowel" or
    "fricative" etc.) where the phones (of your phone set) that belong
    to each class are listed.

    Regards,
    Dietmar



    On 2013-08-06 22:06, Marvin Coto wrote:

        Hello!

        I'm trying to use the demo scripts of HTS 2.2: the Speaker-dependent
        training demo in English and Portuguese, for a voice in Spanish.
        I change the raw files from my spanish database (184 files made from
        good quality recordings), and the utt files (generated with Festival
        2.1, from the text transcription of the audio files). Even though I
        haven't change the Questions folder, I was expected a little more
        quality from the beggining.
        A example of the result, synthetized in Festival can be heard here:

        https://dl.dropboxusercontent.__com/u/81143637/ConEstoico.wav

        <https://dl.dropboxusercontent.com/u/81143637/ConEstoico.wav>

        It supposed to say "Con estoico respeto a la justicia adyacente
        guardo
        sus flechas". I'm sure nobody needs to speak spanish to realize
        something is going very wrong. From that audio file, does
        somebody have
        any idea of where could be the problem? Maybe the raw files, the
        way I'm
        generating the utt, or should be the questions folder?

        In the last case, do I have to change the questions manually, or
        that
        have to be done in HTK?

        Thanks in advance,

        Marvin.


--
----------------------------------------------------------------------
Dipl.-Ing. Dietmar Schabus | Researcher

phone +43 1 5052830-48 | fax -99 | schabus@xxxxxx | http://userver.ftw.at/~schabus/

FTW Telecommunications Research Center Vienna
Donau-City-Straße 1/3 | 1220 Vienna | Austria | www.ftw.at



References
[hts-users:03833] Quality from demos, Marvin Coto
[hts-users:03834] Re: Quality from demos, Dietmar Schabus
[hts-users:03838] Re: Quality from demos, Marvin Coto
[hts-users:03847] Re: Quality from demos, Dietmar Schabus