[hts-users:00500] saturation in the raw
- Subject: [hts-users:00500] saturation in the raw
- From: Alexis Moinet <alexis.moinet@xxxxxxxxxx>
- Date: Thu, 25 Jan 2007 16:13:36 +0100
Hi mailing list,
from time to time (this is not very frequent), it happens that some
samples x of the speech synthesized by hts_engine (the xs in
vocoder.cpp) are bigger than 32767 (x > 1) or lower than -32768 (x <
-1), which means that the line "xs = (short) x;" in vocoder.cpp causes
some sort of saturation.
We tried to synthesize the same sentence with 2 different training
result, one training was made with about 1000 sentences and the other
one with 3000 sentences.
For the 1000-sentences-based-training, we got no saturation but for the
3000-sentences-based-training, we got saturation for one state (10
frames or 50 ms) of one phoneme (5 states). It was a "small" saturation
(< 5 %).
we compared the features (means and variances for cepstrum and f0) of
the model selected by hts_engine for 1000 and 3000 and they both are
relatively the same.
to correct this, we simply weighted all the "x" by 0.75 before computing
xs, but we think it's not an optimal solution because somehow it reduces
the dynamic of the signal. (another solution is to write the raw in
float or in 32 bits integer but we don't find it practical)
Did someone have met this problem before ?
Thanks
Alexis Moinet
PhD student
FPMs - TCTS Lab
- Follow-Ups
-
- [hts-users:00503] Re: saturation in the raw, Nicholas Volk