[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00079] Re: changing sample rate


Nicholas,

> If I understand correctly the duration/f0/melcep models do not depend on
> the sample rate. So theoretically one could use same models for, say,
> 16000
> and 8000 samples/second.

F0 and duration models could be used with other sampling rates,
but spectrum (mel-cepstrum) models can not: each mel-cepstrum
vector with a sampling rate of 16 kHz represents a spectrum from
0 to 8 kHz, though a mel-cepstrum vector with a sampling rate of
8 kHz should represent a spectrum from 0 to 4 kHz.  This cannot
be solved by changing the frequency warping parameter "alpha,"
which you referred to as "vocal tract length."

> I've also build a version from 8K samples, but it was also bad,
> probably about the same as described above.
> (16K voices are perfectly fine).
> 
> (I also did a brute resample hack which took every second sample at
> 16K thus yielding 8K.
> It's relatively fine with the exception of sibilants.)

Down-sampling the original speech signal from 16 kHz to 8 kHz is
the proper way.  A SPTK command "ds" can do this.  We also
constructed a HTS voice with 10 kHz sampling and there is no
problem with it.

Keiichi Tokuda
tokuda@xxxxxxxxxxxxxxxx
http://kt-lab.ics.nitech.ac.jp/~tokuda/

References
[hts-users:00078] changing sample rate, Nicholas Volk