[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04291] Vocoding natural speech


Hi,

We are trying to vocode natural speech to use as a comparison in a naturalness test.  I have been following parts of the HTS demo scripts to do this.  In particular, first converting .wav to .raw, then doing 'make lf0' and 'make mgc', and finally using the 'gen_wave' function in Training.pl.  

The problem is, I get a lot of the error "x2x : warning: input data is over the range of type 'short'!"  And then the output audio has a lot of loud squeaks and pops.  I've isolated the problem to the .wav to .raw conversion - when I do the vocoding starting with the .raw files that came with the SLT demo, instead of converting them from .wav myself, it comes out sounding fine.

I've tried two methods to convert .wav to .raw: first, this one that was in the README in the demo:

ch_wave -c 0 -F 32000 -otype raw in.wav | x2x +sf | interpolate -p 2 -d | ds -s 43 | x2x +fs > out.raw

and this one, that I found in an earlier thread on this mailing list:

for wav ./wav/*.wav
do
  raw=./raw/`basename $wav .wav`.raw
  sox -c 1 -s -w -t wav -r 16000 $wav -c 1 -s -w -t wav -r 48000 $raw
done

and for both of these, the audio comes out sounding bad.

So, does anyone know what might be causing this, or how the .raw files in the SLT demo were generated from .wav, or know of any other way to vocode natural speech in a .wav file?

Thanks very much,
Erica

Follow-Ups
[hts-users:04292] Re: Vocoding natural speech, Rasmus Dall
[hts-users:04294] Re: Vocoding natural speech, Matt Shannon