Hi,
We are trying to vocode natural speech to use as a comparison in a
naturalness test. I have been following parts of the HTS demo scripts to
do this. In particular, first converting .wav to .raw, then doing 'make
lf0' and 'make mgc', and finally using the 'gen_wave' function in
Training.pl.
The problem is, I get a lot of the error "x2x : warning: input data is over
the range of type 'short'!" And then the output audio has a lot of loud
squeaks and pops. I've isolated the problem to the .wav to .raw conversion
- when I do the vocoding starting with the .raw files that came with the
SLT demo, instead of converting them from .wav myself, it comes out
sounding fine.
I've tried two methods to convert .wav to .raw: first, this one that was in
the README in the demo:
ch_wave -c 0 -F 32000 -otype raw in.wav | x2x +sf | interpolate -p 2 -d |
ds -s 43 | x2x +fs > out.raw
and this one, that I found in an earlier thread on this mailing list:
for wav ./wav/*.wav
do
raw=./raw/`basename $wav .wav`.raw
sox -c 1 -s -w -t wav -r 16000 $wav -c 1 -s -w -t wav -r 48000 $raw
done
and for both of these, the audio comes out sounding bad.
So, does anyone know what might be causing this, or how the .raw files in
the SLT demo were generated from .wav, or know of any other way to vocode
natural speech in a .wav file?
Thanks very much,
Erica