[hts-users:04398] Re: synthesis with STRAIGHT

Thanks, all, for the helpful replies!

The hts-engine sample was trained using hts-2.2. The STRAIGHT samples were trained using hts-2.3 and STRAIGHT V40. When I say we are using the same data for both, I just mean the same corpus -- the STRAIGHT training includes the 'bap' features.

The corpus we are training on is from 3 different speakers and we have LOWERF0 and UPPERF0 set to 110 and 280 respectively, same as in the SLT demo, for all speakers, for both STRAIGHT and hts-engine. It sounds like most likely these ought to be changed, which I will try. Are there any other parameters to change, in particular for voiced/unvoiced? I checked both data/Makefile and scripts/Config.pm and didn't see anything that looked relevant.

Best,
Erica

On Mon, Mar 28, 2016 at 9:32 AM, Blaise Potard <bpotard@xxxxxxxxx> wrote:

Hello,

I am not entirely sure which version of the HTS demo or STRAIGHT you are using, as I don't think the demo normally sounds like this. Regardless, when you say you use the exact same data for hts_engine and STRAIGHT synthesis, you mean you are not using mixed excitation at all?

In any case, 1mix / 2mix / stc will produce different parameters from what hts-engine is generating, so if you want to have a fair comparison, you are probably better off dumping the filter / excitation feature coefficients from hts_engine using the -om / -of parameters, and do the synthesis from the generated coefficients using STRAIGHT.

If you still have problems with the synthesis using the parameters generated by HTS-engine, then probably you are using a bad version of STRAIGHT.

If you don't have problems with the synthesis, then it is likely something wrong happened during model training, or, maybe as Rasmus mentioned, during the feature extraction.

Regards,
Blaise

2016-03-25 14:21 GMT+00:00 Erica Cooper <ecooper@xxxxxxxxxxxxxxx>:
Hi,

We've started using STRAIGHT for synthesis, and we've found that for our data, it sounds worse than synthesis with hts-engine, despite the STRAIGHT SLT demo voice sounding very nice. We are using the exact same data with both STRAIGHT and hts-engine synthesis, but the STRAIGHT-synthesized utterances sound 'hoarse.' 1mix, 2mix, and stc are all not so good. I was wondering whether there is any advice for which parameters might be changed to solve this.

original hts-engine voice: http://www.cs.columbia.edu/~ecooper/audio/eng_alice01.wav
STRAIGHT 1mix: http://www.cs.columbia.edu/~ecooper/audio/1mix_alice01.wav
STRAIGHT 2mix: http://www.cs.columbia.edu/~ecooper/audio/2mix_alice01.wav
STRAIGHT stc: http://www.cs.columbia.edu/~ecooper/audio/stc_alice01.wav

Thanks,
Erica