[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04294] Re: Vocoding natural speech


Hi Erica,

The way the HTS demo scales waveforms (which is sometimes called
"normalization") at training and synthesis time has changed over the
past few releases, and is different for the non-STRAIGHT and STRAIGHT
demos (I actually asked a previous question about this topic: see
http://hts.sp.nitech.ac.jp/hts-users/spool/2015/msg00008.html).  It
sounds like the error messages you get may just be from
clipping due to using a bad scale factor in some part of the sequence
of steps you're doing. As a workaround you could try dividing the raw
files by 32767, or some other value.  This should at least tell you
whether the issue you're experiencing is just to do with scaling or is
to do with something more complicated.

Cheers,

Matt


On Mon, 13 Jul 2015 10:58:52 -0400
Erica Cooper <ecooper@xxxxxxxxxxxxxxx> wrote:

> Hi,
> 
> We are trying to vocode natural speech to use as a comparison in a
> naturalness test.  I have been following parts of the HTS demo
> scripts to do this.  In particular, first converting .wav to .raw,
> then doing 'make lf0' and 'make mgc', and finally using the
> 'gen_wave' function in Training.pl.
> 
> The problem is, I get a lot of the error "x2x : warning: input data
> is over the range of type 'short'!"  And then the output audio has a
> lot of loud squeaks and pops.  I've isolated the problem to the .wav
> to .raw conversion
> - when I do the vocoding starting with the .raw files that came with
> the SLT demo, instead of converting them from .wav myself, it comes
> out sounding fine.
> 
> I've tried two methods to convert .wav to .raw: first, this one that
> was in the README in the demo:
> 
> ch_wave -c 0 -F 32000 -otype raw in.wav | x2x +sf | interpolate -p 2
> -d | ds -s 43 | x2x +fs > out.raw
> 
> and this one, that I found in an earlier thread on this mailing list:
> 
> for wav ./wav/*.wav
> do
>   raw=./raw/`basename $wav .wav`.raw
>   sox -c 1 -s -w -t wav -r 16000 $wav -c 1 -s -w -t wav -r 48000 $raw
> done
> 
> and for both of these, the audio comes out sounding bad.
> 
> So, does anyone know what might be causing this, or how the .raw
> files in the SLT demo were generated from .wav, or know of any other
> way to vocode natural speech in a .wav file?
> 
> Thanks very much,
> Erica


References
[hts-users:04291] Vocoding natural speech, Erica Cooper