Hi, You are asking a question that most of the engineers in this community would like to know an answer to. It is important to know about the roles of different tools that compose the whole HTS package. The HTS toolkit itself, provides acoustic model training from feature and label, and feature generation from trained model and to-be-uttered label, aka, HTS provides feature-to-model (trainer), and model-to-feature (generator). How the features are estimated (encoded), and how to reconstruct waveform from generated features (decoded), are the responsibilities of the vocoder. As Sebastien said, hts_engine doesn't support STRAIGHT. hts_engine is a lite version the synthsiser, meaning that it combines generator and decoder. The 'lite' comes in two ways, that 1) the simplest feature generation algorithm is applied, and 2) a fast decoder is applied, i.e the MLSA based filter, which works with binary excitation signal. "doesn't support STRAIGHT" means that the STRAIGHT decoder is not part of hts_engine. But if you can find a way to decode STRAIGHT spectrum using MLSA based filter, you can "generate hts voice using HTS-demo-STRAIGHT". Last time I checked (2 years ago...), the limitation on the number of streams is because hts_engine is suppose to work with only mgc, lf0, and lpf. There are lots of work attempting to "avoid vocoder noise". If you works on more then 3 streams using hts_engine, I suggest you use my single stream generator (http://github.com/naxingyu/StreamGenerator. you can also find it on the HTS entensions page). It generate a single stream of feature, so that you can use more streams and cooperate with your own decoder. FYI, this topic has been discussed extensively in this mailing list. Checking out previous threads would normally help. Best, Xingyu On 04/20/2015 07:34 PM, payman
shaykhmehdi wrote:
|