History source of History(No. 37) - HMM/DNN-based speech synthesis system (HTS)

* History [#af7558b8]
#contents

** 2014 [#w71b4b73]
> ''December 25, 2014''
>> HTS version 2.3 beta was released to the hts-users ML members.

** 2013 [#sca5a2a6]
> ''May 1, 2013''
>> A tutorial about HMM-based speech synthesis was published on Proceedings of the IEEE: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6495700

** 2012 [#k2361095]
> ''December 25, 2012''
>> HTS version 2.3 alpha was released to the hts-users ML members.

** 2011 [#j9420e9d]

> ''July 7, 2011''
>> HTS version 2.2 was released.~
Its new features are
- HERest:
-- Support DAEM algorithm in parameter estimation step.
- HHEd:
-- Support KLD-based state-mapping and cross-lingual speaker adaptation.
-- Context-clustering can be started in the middle of the tree building.
- HMgeTool:
-- Add ECD-based MGE traning command, HMgeTool.
- HSMMAlign:
-- Add stand-alone HSMM based forced-alignment command, HSMMAlign.
- Demo scripts:
-- Change sampling frequency from 16kHz to 48kHz.
-- Support bark critical-band based aperiodic measure.
-- Change speaker and singer of Brazilian Portuguese and Japanese song demo, respectively.
- Slides:
-- Release slides as a tutorial of HMM-based speech synthesis.

> ''March 3, 2011''
>> HTS version 2.2 beta was released to the hts-users ML members.
** 2010 [#tde9d64b]
> ''December 25, 2010''
>> HTS version 2.2 alpha was released to the hts-users ML members.
> ''May 14, 2010''
>> HTS version 2.1.1 was was released.~
Its new features are
- Based on HTK-3.4.1
- Many bug fixes
- HFst:
-- WFST converter for forced-alignment of HSMM
- HMGenS:
-- Initial GV weight for parameter generation
-- Model-level alignments given from label of singing voice to determine note-level durations
- HHEd:
-- Memory reduction options for context-clustering
- Demo scripts
-- Context-dependent GV without silent and pause phoneme
-- Demo using the Nitech Japanese database for singing voice synthesis

** 2009 [#f3d6c52f]

> ''December 25, 2009''
>> HTS version 2.1.1 beta was released to the hts-users ML members.
> ''August 27, 2009''
>> The first HTS meeting in [[Interspeech 2009:http://www.interspeech2009.org/conference/]].
> ''May 22, 2009''
>> HTS-Demo for Brazilian Portuguese is released.
> ''March 16, 2009''
>> Prof. Keiichi Tokuda & Dr. Heiga Zen have a [[tutorial about HMM-based speech synthesis>Tutorial]] at [[Interspeech 2009:http://www.interspeech2009.org/conference/]].  

** 2008 [#qd958615]

> ''July 31, 2008''
>> hts_engine website moved to http://hts-engine.sourceforge.net/~
hts_engine API version 1.01 and Flite+hts_engine version 0.90 were released.
> ''July 14, 2008''
>> [[Keiichiro Oura:http://www.sp.nitech.ac.jp/~uratec/]] took over the maintainer of HTS from [[Heiga Zen:http://www.sp.nitech.ac.jp/~zen/]].
> ''June 27, 2008''
>> HTS version 2.1 and hts_engine API version 1.0 were released.~
Their new features are
- HTS-2.1
-- Many bug fixes
-- Released under the New and Simplified BSD license
-- Simple documentation
-- 64-bit compile support
-- MAXSTRLEN (max length of strings), SMAX (max # of streams), and PAT_LEN (max length of patterns) can be set through configure script like
 ./configure MAXSTRLEN=1024 SMAX=20
-- HFB:
--- HSMM training and adaptation
-- HAdapt:
--- SMAPLR/CSMAPLR adaptation
-- HGen:
--- Speech parameter generation algorithm considering GV
--- Random generation of state transitions, state durations, and mixture components (by configuration variable RNDFLAGS)
-- HMGenS:
--- Speech parameter generation from HSMMs
-- HHEd:
--- Add DM command to delete existing macros
--- Add IT command to impose pre-built trees in clustering
--- Add JM command to merge difference models on state or stream levels
--- MU command supports '*2' style mixing up
--- MU command supports mixture-level occcupancy threshhold in mixing up (by configuration variable MINMIXOCC)
- hts_engine API-1.0:
-- Released under the New and Simplified BSD license
-- Support LSP-type parameters including LSP, mel-LSP, and MGC-LSP
-- Speech parameter generation algorithm considering GV

> ''June 13, 2008''
>> HTS version 2.1RC2 and hts_engine API version 0.99 were released to the hts-users ML members.
> ''May 27, 2008''
>> HTS voice building tools for the MARY platform was released with [[DFKI MARY 3.6.0:http://mary.dfki.de/Download/mary-3-6-0-released]].
> ''March 24, 2008''
>> HTS version 2.1RC1 and hts_engine API version 0.96 were released to the hts-users ML members.
> ''January 15, 2008''
>> HTS version 2.1beta and hts_engine_API version 0.95 were released to the hts-users ML members.

** 2007 [#afa4796d]
> ''December 7, 2007''
>> hts_engine was ported to Java and included in [[DFKI MARY 3.5:http://mary.dfki.de/Download/mary-3-5-0-released]]. 
> ''November 1, 2007''
>> HTS version 2.1alpha was released to the hts-users ML members
> ''October 1, 2007''
>> HTS version 2.0.1 and hts_engine_API version 0.9 were released.~
The new features are
- Many bug fixes.
- Band structure for linear transforms.
- Stream-dependent variance flooring scales.
- State duration model mmf structure is changed.  In the previous versions we used a multi-variate Gaussian PDF to represent state duration PDFs of an HMM.     However, from this version we use multi-stream structure.  This is very important for the future HSMM support.  
- Demo scripts support LSP-type parameters for spectral representation in addition to cepstral ones.
- API-style implementation of hts_engine.  Old stand-alone hts_engine will be thrown away.

> ''September 20, 2007''
>> HTS version 2.0.1RC1 was released to the hts-users ML members.
> ''September 18, 2007''
>> HTS version 2.0.1RC1 was released to the internal working group members.
> ''March 20, 2007''
>> Mail address of hts-users ML changed to hts-users@sp.nitech.ac.jp
> ''March 7, 2007''
>> HTS website moved to http://hts.sp.nitech.ac.jp/

** 2006 [#z5d7dda6]
>''December 29, 2006''
>> HTS version 2.0 was '''finally''' released :-)~
The new features are
- Based on [[HTK-3.4>http://htk.eng.cam.ac.uk/download.shtml]].
- Compilation without [[SPTK>http://www.sp.nitech.ac.jp/~tokuda/SPTK/index.html]].
- The [[license of HTS>License]] was slightly modified.
- Support gcc4.
- Thousands of fixed bugs.
- HRest can generate state duration densities (-g option).
- Model boundaries can be given to HERest (-e option).~
We may specify a part of model boundaries (e.g, pause positions).
- Reduced-memory implementation of context clustering in HHEd (-r option).
- Each decision tree can have a name with regular expression (-p option).~
 TB 000 {(*-a+*, *-i+*, *-u+*).state[2]}
 TB 000 {(*-sil+*, *-pau+*).state[3]}
As a result, different two trees can be constructed for consonants and vowels, respectively.
- The interface of HMGenS has been switched from ''HHEd-style'' to ''HERest-style''.
- Flexible model structures in HMGenS (in the previous version, the first stream is assumed as mcep, and the others are assumed as log F0).  Non-left-to-right models and full covariance matrices for state output pdfs can also be used.
- EM-based parameter generation algorithm (-c option), i.e, mixture of Gaussians can be used. 
-- -c 0: Cholesky decomposition
-- -c 1: EM (with fixed state sequence)
-- -c 2: EM (phone boundaries can be given with -e option)
- Random generation algorithm is also supported (set config. variable RNDPG = TRUE).
- Speaker adaptation, adaptive training, and semi-tied covariance transforms are supported for multi-stream HMMs/MSD-HMMs. 
-- MLLRMEAN, MLLRCOV, and CMLLR-based adaptation.
-- CMLLR-based adaptive training.
-- Decision trees for context clustering can be used to define regression classes for adaptation.
-- HMGenS can read MLLRMEAN, MLLRCOV, CMLLR, and SEMIT transforms for adaptation.
- MAP adaptation is also supported.
- Performance improvements in hts_engine.
- Miscellaneous changes.

>''December 4, 2006''
>> HTS version 2.0RC3 was released to members of Mailing List.
>''July 1, 2006''
>> HTS version 2.0RC2 was released to members of Mailing List.
>''March 3, 2006''
>> HTS version 2.0RC1 was released to members of Mailing List.
>''February 15, 2006''
>> HTS version 2.0RC0 was released to the internal working group. 

** 2003 [#ld26312a]
>''December 26, 2003''
>> HTS version 1.1.1 was released. The new features were
- Based on HTK-3.2.1
- Demo script for ARCTIC database
- Demo script for an original database (Japanese)
- Variance flooring in demo script
- Postfiltering in hts-engine
- Many fixed bugs

>''Oct. 14, 2003''
>> New HTS voices trained by ARCTIC databases were released. 
>''June 11, 2003''
>>  HTS version 1.1b was released.
>''May 9, 2003''
>> HTS version 1.1 was released. The new features were
- A small synthesis engine (to be called from Festival).
- HMM file format converter for the engine.
- Many fixed bugs (Thanks for reporting them).
- Accompanied by HTS voices for Festival.

>''January 21, 2003''
>> Minor revision was made to HTS version 1.0. 

** 2002 [#s0e2a8a8]
>''December 25, 2002''
>> HTS version 1.0 was released.
//>''September:''
//>> The first paper about eigenvoice for HMM-based speech synthesis was //appeared in ICSLP'02.
//
//** 1999 [#v67c04e5]
//> ''September:''
//>> The first paper about the current HTS framework was appeared in //Eurospeech'99. 
//> ''March:''
//>> The first paper about MSD-HMM was appeared in ICASSP'99.
//
//** 1998 [#v760e959]
//> ''November:''
//>> The first paper about speaker adaptation for the HMM-based speech synthesis //using MLLR was appeared in ESCA/COCOSDA workshop on speech synthesis.
//
//** 1997 [#v95967aa]
//> ''September:''
//>> The first paper about speaker interpolation for the HMM-based speech //synthesis was appeared in Eurospeech'97.
//
** 1995 [#v5285cba]
> ''May 9, 1995''
>> The first paper about the speech parameter generation algorithm was appeared  in ICASSP'95.
HMM/DNN-based Speech Synthesis System (HTS) - History source of History (No. 37)