[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00500] saturation in the raw


Hi mailing list,

from time to time (this is not very frequent), it happens that some samples x of the speech synthesized by hts_engine (the xs in vocoder.cpp) are bigger than 32767 (x > 1) or lower than -32768 (x < -1), which means that the line "xs = (short) x;" in vocoder.cpp causes some sort of saturation.

We tried to synthesize the same sentence with 2 different training result, one training was made with about 1000 sentences and the other one with 3000 sentences.

For the 1000-sentences-based-training, we got no saturation but for the 3000-sentences-based-training, we got saturation for one state (10 frames or 50 ms) of one phoneme (5 states). It was a "small" saturation (< 5 %).

we compared the features (means and variances for cepstrum and f0) of the model selected by hts_engine for 1000 and 3000 and they both are relatively the same.

to correct this, we simply weighted all the "x" by 0.75 before computing xs, but we think it's not an optimal solution because somehow it reduces the dynamic of the signal. (another solution is to write the raw in float or in 32 bits integer but we don't find it practical)

Did someone have met this problem before ?

Thanks

Alexis Moinet

PhD student
FPMs - TCTS Lab


Follow-Ups
[hts-users:00503] Re: saturation in the raw, Nicholas Volk