History source of Home(No. 73) - HMM/DNN-based speech synthesis system (HTS)

* Welcome! [#k4f3be02]
> The [[HMM/DNN-based Speech Synthesis System (HTS)>http://hts.sp.nitech.ac.jp/]] has been developed by the HTS working group and others (see [[Who we are]] and [[Acknowledgments]]).  The training part of HTS has been implemented as a modified version of [[HTK:http://htk.eng.cam.ac.uk/]] and released as a form of patch code to HTK.  The patch code is released under a free software license.  However, it should be noted that &color(red){once you apply the patch to HTK, you must obey the [[license of HTK:http://htk.eng.cam.ac.uk/docs/license.shtml]].};
Related publications about the techniques and algorithms used in HTS can be
found [[here>Publications]].

// 2.3

> HTS version 2.3 includes VBLR speaker adaptation, DAEM-based parameter generation algorithm, and other minor new features.
Many bugs in HTS version 2.2 were also fixed.
HTS does not include any text analyzers but the [[Festival Speech Synthesis System>http://www.festvox.org/festival/]] (English, Spanish, etc.), [[DFKI MARI Text-to-Speech System>http://mary.dfki.de/]] (German, English, etc.), [[Flite+hts_engine>http://hts-engine.sourceforge.net]] (English), [[Open JTalk>http://open-jtalk.sourceforge.net/]] (Japanese), or other text analyzers can be used with HTS.
HTS slides are also released as a tutorial of HMM-based speech synthesis.

> This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database>http://www.festvox.org/cmu_arctic/]] (English).
For training other voices, demo scripts using NITech database (Portuguese, Japanese, and Japanese song) are also released.

> In addition, HTS version 2.3.1 demo scripts support frame-by-frame modeling option using &color(red){DNN (deep neural network)}; based on HMM state alignment.

// 2.2

//> HTS version 2.2 includes deterministic annealing EM algorithm in parameter estimation step, KLD-based state-mapping and cross-lingual speaker adaptation, minimum generation error (MGE) training, and other minor new features.
//Many bugs in HTS version 2.1.1 were also fixed.
//HTS does not include any text analyzers but the [[Festival Speech Synthesis System>http://www.festvox.org/festival/]] (English, Spanish, etc.), [[DFKI MARI Text-to-Speech System>http://mary.dfki.de/]] (German, English, etc.), [[Flite+hts_engine>http://hts-engine.sourceforge.net]] (English), [[Open JTalk>http://open-jtalk.sourceforge.net/]] (Japanese), or other text analyzers can be used with HTS.
//HTS slides are also released as a tutorial of HMM-based speech synthesis.

//> This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database>http://www.festvox.org/cmu_arctic/]] (English).
//For training other voices, demo scripts using Nitech database (Portuguese, Japanese, and Japanese Song) are also released.

// 2.1.1

//> HTS version 2.1.1 is based on HTK-3.4.1 and includes forced-alignment of hidden semi-Markov model (HSMM) and other minor new features.
//Many bugs in HTS version 2.1 were also fixed.
//HTS does not include any text analyzers but the [[Festival Speech Synthesis System:http://www.festvox.org/festival/]], [[DFKI MARY Text-to-Speech System:http://mary.dfki.de/]], [[Flite+hts_engine>http://hts-engine.sourceforge.net]], or other text analyzers can be used with HTS.
//This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database:http://www.festvox.org/cmu_arctic/]] (English).

//> For training Japanese voices, a demo script using the Nitech database is also prepared.  Japanese voices trained by the demo script can be used on [[Open JTalk>http://open-jtalk.sourceforge.net/]], which is a Japanese text-to-speech synthesis.

// 2.1

//> HTS version 2.1 includes hidden semi-Markov model (HSMM) training/adaptation/synthesis, speech parameter generation algorithm considering global variance (GV), SMAPLR/CSMAPLR adaptation, and other minor new features.  Many bugs in HTS version 2.0.1 were also fixed.  The API for runtime synthesis module, hts_engine API, version 1.0 was also released.  Because hts_engine can run without the HTK library, users can develop their own open or proprietary softwares based on hts_engine.  HTS and hts_engine API does not include any text analyzers but the [[Festival Speech Synthesis System:http://www.festvox.org/festival/]], [[DFKI MARY Text-to-Speech System:http://mary.dfki.de/]], or other text analyzers can be used with HTS.  This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database:http://www.festvox.org/cmu_arctic/]] (English).
//Six HTS voices for Festival 1.96 are also released.  They use the hts_engine //module included in Festival.  Each of HTS voices can be used without any other HTS tools.

//> For training Japanese voices, a demo script using the Nitech database is also prepared.  Japanese voices trained by the demo script can be used on [[GalateaTalk:http://hil.t.u-tokyo.ac.jp/~galatea/]], which is a speech synthesis module of an open-source toolkit for anthropomorphic spoken dialogue agents developed in [[Galatea project:http://hil.t.u-tokyo.ac.jp/~galatea/]].  An HTS voice for Galatea trained by the demo script is also released.

* News! [#ve28e7f9]

- ''December 25, 2017''
> HTS version 2.3.2 was released.~
Its new features are
- Demo scripts:
-- Add trajectory training considering global variance based on DNN (deep neural network).
-- Add speaker adaptive training for DNN. (It trains the connection weights of the whole DNN for each speaker.)

- ''December 25, 2016''
> HTS version 2.3.1 was released.~
Its new features are
- Demo scripts:
-- Add frame-by-frame modeling option using DNN (deep neural network) based on HMM state alignment.

- ''December 25, 2015''
> HTS version 2.3 was released.~
Its new features are
- HERest:
-- Add VBLR adaptation.
- HMGenS:
-- Add DAEM-based parameter generation.
-- Support DP search to determine state duration when the model alignments are given.
- HInit, HRest, HRest:
-- Support parallel mode.
- HHEd:
-- Speed up context-clustering by calculating differences between answers to current and previous questions.
-- Add untying weights function in HHEd.
- Demo scripts:
-- Add modulation spectrum-based postfilter.
-- Support text files instead of utt files for general English database.
-- Turn off spectrum normalization in STRAIGHT.
-- Add LSP postfilter.
-- Support mel-cepstrum based aperiodic measure generated by STRAIGHT.
-- Support new HTS voice format for hts engine API.
-- Integrate normal demo and STRAIGHT demo.

- ''December 25, 2014''
> HTS version 2.3 beta was released to the hts-users ML members.

- ''May 1, 2013''
> A tutorial about HMM-based speech synthesis was published on Proceedings of the IEEE: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6495700

- ''December 25, 2012''
> HTS version 2.3 alpha was released to the hts-users ML members.

//- ''July 7, 2011''
//> HTS version 2.2 was released.~
//Its new features are
//- HERest:
//-- Support DAEM algorithm in parameter estimation step.
//- HHEd:
//-- Support KLD-based state-mapping and cross-lingual speaker adaptation.
//-- Context-clustering can be started in the middle of the tree building.
//- HMgeTool:
//-- Add ECD-based MGE traning command, HMgeTool.
//- HSMMAlign:
//-- Add stand-alone HSMM based forced-alignment command, HSMMAlign.
//- Demo scripts:
//-- Change sampling frequency from 16kHz to 48kHz.
//-- Support bark critical-band based aperiodic measure.
//-- Change speaker and singer of Brazilian Portuguese and Japanese song demo, respectively.
//- Slides:
//-- Release slides as a tutorial of HMM-based speech synthesis.

//- ''March 3, 2011''
//> HTS version 2.2 beta was released to the hts-users ML members.
//- ''December 25, 2010''
//> HTS version 2.2 alpha was released to the hts-users ML members.
//- ''May 14, 2010''
//> HTS version 2.1.1 was was released.~
//Its new features are
//- Based on HTK-3.4.1
//- Many bug fixes
//- HFst:
//-- WFST converter for forced-alignment of HSMM
//- HMGenS:
//-- Initial GV weight for parameter generation
//-- Model-level alignments given from label of singing voice to determine note-level durations
//- HHEd:
//-- Memory reduction options for context-clustering
//- Demo scripts:
//-- Context-dependent GV without silent and pause phoneme
//-- Demo using the Nitech Japanese database for singing voice synthesis

//- ''December 25, 2009''
//> HTS version 2.1.1 beta was released to the hts-users ML members.

//- ''August 27, 2009''
//> The first HTS meeting in [[Interspeech 2009:http://www.interspeech2009.org/conference/]].

//- ''May 22, 2009''
//> HTS-Demo for Brazilian Portuguese is released.

// - ''March 16, 2009''
// > Prof. Keiichi Tokuda & Dr. Heiga Zen have a [[tutorial about HMM-based speech synthesis>Tutorial]] at [[Interspeech 2009:http://www.interspeech2009.org/conference/]].  

//- ''July 31, 2008''
//> The API of runtime synthesis engine, hts_engine API, was splitted from HTS itself and moved to [[SourceForge:http://hts-engine.sourceforge.net/]].~
// hts_engine API version 1.01 and Flite+hts_engine version 0.90 were released.

//- ''July 14, 2008''
//> [[Keiichiro Oura:http://www.sp.nitech.ac.jp/~uratec/]] took over the //maintainer of HTS from [[Heiga Zen:http://www.sp.nitech.ac.jp/~zen/]].
//- ''June 27, 2008''

//> HTS version 2.1 and hts_engine API version 1.0 were released.~
//Their new features are
//- HTS-2.1
//-- Many bug fixes
//-- Released under the [[New and Simplified BSD //license:http://www.opensource.org/]]
//-- Simple documentation
//-- 64-bit compile support
//-- MAXSTRLEN (max length of strings), SMAX (max # of streams), and PAT_LEN //(max length of patterns) can be set through configure script like
// ./configure MAXSTRLEN=1024 SMAX=20
//-- HFB:
//--- HSMM training and adaptation
//-- HAdapt:
//--- SMAPLR/CSMAPLR adaptation
//-- HGen:
//--- Speech parameter generation algorithm considering GV
//--- Random generation of state transitions, state durations, and mixture //components (by configuration variable RNDFLAGS)
//-- HMGenS:
//--- Speech parameter generation from HSMMs
//-- HHEd:
//--- Add DM command to delete existing macros
//--- Add IT command to impose pre-built trees in clustering
//--- Add JM command to merge difference models on state or stream levels
//--- MU command supports '*2' style mixing up
//--- MU command supports mixture-level occupancy threshold in mixing up (by //configuration variable MINMIXOCC)
//- hts_engine API-1.0:
//-- Released under the [[New and Simplified BSD //license:http://www.opensource.org/]]
//-- Support LSP-type parameters including LSP, mel-LSP, and MGC-LSP
//-- Speech parameter generation algorithm considering GV

//- ''June 13, 2008''
//> HTS version 2.1RC2 and hts_engine API version 0.99 were released to the hts-//users ML members.~
//See [[here:http://hts.sp.nitech.ac.jp/hts-users/spool/2008/msg00336.html]] for //details.

// - ''May 27, 2008''
// > HTS voice building tools for the MARY platform was released with [[DFKI MARY 3.6.0:http://mary.dfki.de/Download/mary-3-6-0-released]].
// 
// - ''March 24, 2008''
// > HTS version 2.1RC1 and hts_engine API version 0.96 were released to the hts-users ML members. See [[here:http://hts.sp.nitech.ac.jp/hts-users/spool/2008/msg00175.html]] for details.

// - ''January 15, 2008''
// > HTS version 2.1beta and hts_engine API version 0.95 were released to the hts-users ML members.

// - ''December 7, 2007''
// > hts_engine was ported to Java and included in [[DFKI MARY 3.5:http://mary.dfki.de/Download/mary-3-5-0-released]]. 

// - ''November 1, 2007''
// > HTS version 2.1alpha was released to the hts-users ML members.
// - ''October 1, 2007''
// > HTS version 2.0.1 and hts_engine_API version 0.9 were released.~
// The new features are
// - Many bug fixes.
// - Band structure for linear transforms.
// - Stream-dependent variance flooring scales.
// - State duration model mmf structure is changed.  In the previous versions we // used a multi-variate Gaussian PDF to represent state duration PDFs of an HMM.     // However, from this version we use multi-stream structure.  This is very important for the future HSMM support.
// - Demo scripts support LSP-type parameters for spectral representation in addition to cepstral ones.
// - API-style implementation of hts_engine.  Old stand-alone hts_engine will be thrown away.

// - ''September 20, 2007''
// > HTS version 2.0.1RC1 was released to the hts-users ML members.

// - ''September 18, 2007''
// > HTS version 2.0.1RC1 was released to the internal working group members.
HMM/DNN-based Speech Synthesis System (HTS) - History source of Home (No. 73)