Hi,
We are trying to vocode natural speech to use as a comparison in a naturalness test. I have been following parts of the HTS demo scripts to do this. In particular, first converting .wav to .raw, then doing 'make lf0' and 'make mgc', and finally using the 'gen_wave' function in Training.pl.
The problem is, I get a lot of the error "x2x : warning: input data is over the range of type 'short'!" And then the output audio has a lot of loud squeaks and pops. I've isolated the problem to the .wav to .raw conversion - when I do the vocoding starting with the .raw files that came with the SLT demo, instead of converting them from .wav myself, it comes out sounding fine.
I've tried two methods to convert .wav to .raw: first, this one that was in the README in the demo:
ch_wave -c 0 -F 32000 -otype raw in.wav | x2x +sf | interpolate -p 2 -d | ds -s 43 | x2x +fs > out.raw
and this one, that I found in an earlier thread on this mailing list:
for wav ./wav/*.wav
do
raw=./raw/`basename $wav .wav`.raw
sox -c 1 -s -w -t wav -r 16000 $wav -c 1 -s -w -t wav -r 48000 $raw
done
and for both of these, the audio comes out sounding bad.
So, does anyone know what might be causing this, or how the .raw files in the SLT demo were generated from .wav, or know of any other way to vocode natural speech in a .wav file?
Thanks very much,
Erica