History source of Home(No. 67) - HMM/DNN-based speech synthesis system (HTS)

* Welcome! [#k4f3be02]
> The [[HMM-based Speech Synthesis System (HTS)>http://hts.sp.nitech.ac.jp/]] has been developed by the HTS working group and others (see [[Who we are]] and [[Acknowledgments]]).  The training part of HTS has been implemented as a modified version of [[HTK:http://htk.eng.cam.ac.uk/]] and released as a form of patch code to HTK.  The patch code is released under a free software license.  However, it should be noted that &color(red){once you apply the patch to HTK, you must obey the [[license of HTK:http://htk.eng.cam.ac.uk/docs/license.shtml]].};
Related publications about the techniques and algorithms used in HTS can be
found [[here>Publications]].

// 2.2

> HTS version 2.2 includes deterministic annealing EM algorithm in parameter estimation step, KLD-based state-mapping and cross-lingual speaker adaptation, minimum generation error (MGE) training, and other minor new features.
Many bugs in HTS version 2.1.1 were also fixed.
HTS does not include any text analyzers but the [[Festival Speech Synthesis System>http://www.festvox.org/festival/]] (English, Spanish, etc.), [[DFKI MARI Text-to-Speech System>http://mary.dfki.de/]] (German, English, etc.), [[Flite+hts_engine>http://hts-engine.sourceforge.net]] (English), [[Open JTalk>http://open-jtalk.sourceforge.net/]] (Japanese), or other text analyzers can be used with HTS.
HTS slides are also released as a tutorial of HMM-based speech synthesis.

> This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database>http://www.festvox.org/cmu_arctic/]] (English).
For training other voices, demo scripts using Nitech database (Portuguese, Japanese, and Japanese Song) are also released.

// 2.1.1

//> HTS version 2.1.1 is based on HTK-3.4.1 and includes forced-alignment of hidden semi-Markov model (HSMM) and other minor new features.
//Many bugs in HTS version 2.1 were also fixed.
//HTS does not include any text analyzers but the [[Festival Speech Synthesis System:http://www.festvox.org/festival/]], [[DFKI MARY Text-to-Speech System:http://mary.dfki.de/]], [[Flite+hts_engine>http://hts-engine.sourceforge.net]], or other text analyzers can be used with HTS.
//This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database:http://www.festvox.org/cmu_arctic/]] (English).

//> For training Japanese voices, a demo script using the Nitech database is also prepared.  Japanese voices trained by the demo script can be used on [[Open JTalk>http://open-jtalk.sourceforge.net/]], which is a Japanese text-to-speech synthesis.

// 2.1

//> HTS version 2.1 includes hidden semi-Markov model (HSMM) training/adaptation/synthesis, speech parameter generation algorithm considering global variance (GV), SMAPLR/CSMAPLR adaptation, and other minor new features.  Many bugs in HTS version 2.0.1 were also fixed.  The API for runtime synthesis module, hts_engine API, version 1.0 was also released.  Because hts_engine can run without the HTK library, users can develop their own open or proprietary softwares based on hts_engine.  HTS and hts_engine API does not include any text analyzers but the [[Festival Speech Synthesis System:http://www.festvox.org/festival/]], [[DFKI MARY Text-to-Speech System:http://mary.dfki.de/]], or other text analyzers can be used with HTS.  This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database:http://www.festvox.org/cmu_arctic/]] (English).
//Six HTS voices for Festival 1.96 are also released.  They use the hts_engine //module included in Festival.  Each of HTS voices can be used without any other HTS tools.

//> For training Japanese voices, a demo script using the Nitech database is also prepared.  Japanese voices trained by the demo script can be used on [[GalateaTalk:http://hil.t.u-tokyo.ac.jp/~galatea/]], which is a speech synthesis module of an open-source toolkit for anthropomorphic spoken dialogue agents developed in [[Galatea project:http://hil.t.u-tokyo.ac.jp/~galatea/]].  An HTS voice for Galatea trained by the demo script is also released.

* News! [#ve28e7f9]

- ''July 7, 2011''
> HTS version 2.2 was released.~
Its new features are
- HERest:
-- Support DAEM algorithm in parameter estimation step.
- HHEd:
-- Support KLD-based state-mapping and cross-lingual speaker adaptation.
-- Context-clustering can be started in the middle of the tree building.
- HMgeTool:
-- Add ECD-based MGE traning command, HMgeTool.
- HSMMAlign:
-- Add stand-alone HSMM based forced-alignment command, HSMMAlign.
- Demo scripts:
-- Change sampling frequency from 16kHz to 48kHz.
-- Support bark critical-band based aperiodic measure.
-- Change speaker and singer of Brazilian Portuguese and Japanese song demo, respectively.
- Slides:
-- Release slides as a tutorial of HMM-based speech synthesis.

- ''March 3, 2011''
> HTS version 2.2 beta was released to the hts-users ML members.

- ''December 25, 2010''
> HTS version 2.2 alpha was released to the hts-users ML members.

//- ''May 14, 2010''
//> HTS version 2.1.1 was was released.~
//Its new features are
//- Based on HTK-3.4.1
//- Many bug fixes
//- HFst:
//-- WFST converter for forced-alignment of HSMM
//- HMGenS:
//-- Initial GV weight for parameter generation
//-- Model-level alignments given from label of singing voice to determine note-level durations
//- HHEd:
//-- Memory reduction options for context-clustering
//- Demo scripts:
//-- Context-dependent GV without silent and pause phoneme
//-- Demo using the Nitech Japanese database for singing voice synthesis

//- ''December 25, 2009''
//> HTS version 2.1.1 beta was released to the hts-users ML members.

//- ''August 27, 2009''
//> The first HTS meeting in [[Interspeech 2009:http://www.interspeech2009.org/conference/]].

//- ''May 22, 2009''
//> HTS-Demo for Brazilian Portuguese is released.

// - ''March 16, 2009''
// > Prof. Keiichi Tokuda & Dr. Heiga Zen have a [[tutorial about HMM-based speech synthesis>Tutorial]] at [[Interspeech 2009:http://www.interspeech2009.org/conference/]].  

//- ''July 31, 2008''
//> The API of runtime synthesis engine, hts_engine API, was splitted from HTS itself and moved to [[SourceForge:http://hts-engine.sourceforge.net/]].~
// hts_engine API version 1.01 and Flite+hts_engine version 0.90 were released.

//- ''July 14, 2008''
//> [[Keiichiro Oura:http://www.sp.nitech.ac.jp/~uratec/]] took over the //maintainer of HTS from [[Heiga Zen:http://www.sp.nitech.ac.jp/~zen/]].
//- ''June 27, 2008''

//> HTS version 2.1 and hts_engine API version 1.0 were released.~
//Their new features are
//- HTS-2.1
//-- Many bug fixes
//-- Released under the [[New and Simplified BSD //license:http://www.opensource.org/]]
//-- Simple documentation
//-- 64-bit compile support
//-- MAXSTRLEN (max length of strings), SMAX (max # of streams), and PAT_LEN //(max length of patterns) can be set through configure script like
// ./configure MAXSTRLEN=1024 SMAX=20
//-- HFB:
//--- HSMM training and adaptation
//-- HAdapt:
//--- SMAPLR/CSMAPLR adaptation
//-- HGen:
//--- Speech parameter generation algorithm considering GV
//--- Random generation of state transitions, state durations, and mixture //components (by configuration variable RNDFLAGS)
//-- HMGenS:
//--- Speech parameter generation from HSMMs
//-- HHEd:
//--- Add DM command to delete existing macros
//--- Add IT command to impose pre-built trees in clustering
//--- Add JM command to merge difference models on state or stream levels
//--- MU command supports '*2' style mixing up
//--- MU command supports mixture-level occupancy threshold in mixing up (by //configuration variable MINMIXOCC)
//- hts_engine API-1.0:
//-- Released under the [[New and Simplified BSD //license:http://www.opensource.org/]]
//-- Support LSP-type parameters including LSP, mel-LSP, and MGC-LSP
//-- Speech parameter generation algorithm considering GV

//- ''June 13, 2008''
//> HTS version 2.1RC2 and hts_engine API version 0.99 were released to the hts-//users ML members.~
//See [[here:http://hts.sp.nitech.ac.jp/hts-users/spool/2008/msg00336.html]] for //details.

// - ''May 27, 2008''
// > HTS voice building tools for the MARY platform was released with [[DFKI MARY 3.6.0:http://mary.dfki.de/Download/mary-3-6-0-released]].
// 
// - ''March 24, 2008''
// > HTS version 2.1RC1 and hts_engine API version 0.96 were released to the hts-users ML members. See [[here:http://hts.sp.nitech.ac.jp/hts-users/spool/2008/msg00175.html]] for details.

// - ''January 15, 2008''
// > HTS version 2.1beta and hts_engine API version 0.95 were released to the hts-users ML members.

// - ''December 7, 2007''
// > hts_engine was ported to Java and included in [[DFKI MARY 3.5:http://mary.dfki.de/Download/mary-3-5-0-released]]. 

// - ''November 1, 2007''
// > HTS version 2.1alpha was released to the hts-users ML members.
// - ''October 1, 2007''
// > HTS version 2.0.1 and hts_engine_API version 0.9 were released.~
// The new features are
// - Many bug fixes.
// - Band structure for linear transforms.
// - Stream-dependent variance flooring scales.
// - State duration model mmf structure is changed.  In the previous versions we // used a multi-variate Gaussian PDF to represent state duration PDFs of an HMM.     // However, from this version we use multi-stream structure.  This is very important for the future HSMM support.
// - Demo scripts support LSP-type parameters for spectral representation in addition to cepstral ones.
// - API-style implementation of hts_engine.  Old stand-alone hts_engine will be thrown away.

// - ''September 20, 2007''
// > HTS version 2.0.1RC1 was released to the hts-users ML members.

// - ''September 18, 2007''
// > HTS version 2.0.1RC1 was released to the internal working group members.
HMM/DNN-based Speech Synthesis System (HTS) - History source of Home (No. 67)