The HMM/DNN-based Speech Synthesis System (HTS) has been developed by the HTS working group and others (see Who we are and Acknowledgments). The training part of HTS has been implemented as a modified version of HTK and released as a form of patch code to HTK. The patch code is released under a free software license. However, it should be noted that once you apply the patch to HTK, you must obey the license of HTK. Related publications about the techniques and algorithms used in HTS can be found here.
HTS version 2.3 includes VBLR speaker adaptation, DAEM-based parameter generation algorithm, and other minor new features. Many bugs in HTS version 2.2 were also fixed. HTS does not include any text analyzers but the Festival Speech Synthesis System (English, Spanish, etc.), DFKI MARI Text-to-Speech System (German, English, etc.), Flite+hts_engine (English), Open JTalk (Japanese), or other text analyzers can be used with HTS. HTS slides are also released as a tutorial of HMM-based speech synthesis.
This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using CMU ARCTIC database (English). For training other voices, demo scripts using NITech database (Portuguese, Japanese, and Japanese song) are also released.
In addition, HTS version 2.3.1 demo scripts support frame-by-frame modeling option using DNN (deep neural network) based on HMM state alignment.
The code to train DNN-HSMM for text-to-speech synthesis was released.
DNN-HSMM maps phoneme(state)-level linguistic features into hidden-semi Markov model parameters.
- The code:
- Supports model training based on a maximum likelihood criterion.
- Supports maximum likelihood parameter generation (MLPG).
HTS version 2.3.2 was released.
Its new features are
- Demo scripts:
- Add trajectory training considering global variance based on DNN (deep neural network).
- Add speaker adaptive training for DNN. (It trains the connection weights of the whole DNN for each speaker.)
HTS version 2.3.1 was released.
Its new features are
- Demo scripts:
- Add frame-by-frame modeling option using DNN (deep neural network) based on HMM state alignment.
HTS version 2.3 was released.
Its new features are
- HERest:
- Add VBLR adaptation.
- HMGenS:
- Add DAEM-based parameter generation.
- Support DP search to determine state duration when the model alignments are given.
- HInit, HRest, HRest:
- Support parallel mode.
- HHEd:
- Speed up context-clustering by calculating differences between answers to current and previous questions.
- Add untying weights function in HHEd.
- Demo scripts:
- Add modulation spectrum-based postfilter.
- Support text files instead of utt files for general English database.
- Turn off spectrum normalization in STRAIGHT.
- Add LSP postfilter.
- Support mel-cepstrum based aperiodic measure generated by STRAIGHT.
- Support new HTS voice format for hts engine API.
- Integrate normal demo and STRAIGHT demo.
HTS version 2.3 beta was released to the hts-users ML members.
A tutorial about HMM-based speech synthesis was published on Proceedings of the IEEE: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6495700
HTS version 2.3 alpha was released to the hts-users ML members.