History - HMM/DNN-based speech synthesis system (HTS)

History†

History
- 2015
- 2014
- 2013
- 2012
- 2011
- 2010
- 2009
- 2008
- 2007
- 2006
- 2003
- 2002
- 1995

2015†

December 25, 2015

HTS version 2.3 was released.
Its new features are

HERest:
Add VBLR adaptation.

HMGenS:
Add DAEM-based parameter generation.

Support DP search to determine state duration when the model alignments are given.

HInit, HRest, HRest:
Support parallel mode.

HHEd:
Speed up context-clustering by calculating differences between answers to current and previous questions.

Add untying weights function in HHEd.

Demo scripts:
Add modulation spectrum-based postfilter.

Support text files instead of utt files for general English database.

Turn off spectrum normalization in STRAIGHT.

Add LSP postfilter.

Support mel-cepstrum based aperiodic measure generated by STRAIGHT.

Support new HTS voice format for hts engine API.

Integrate normal demo and STRAIGHT demo.

↑

2014†

December 25, 2014

HTS version 2.3 beta was released to the hts-users ML members.

↑

2013†

May 1, 2013

A tutorial about HMM-based speech synthesis was published on Proceedings of the IEEE: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6495700

↑

2012†

December 25, 2012

HTS version 2.3 alpha was released to the hts-users ML members.

↑

2011†

July 7, 2011

HTS version 2.2 was released.
Its new features are

HERest:
Support DAEM algorithm in parameter estimation step.

HHEd:
Support KLD-based state-mapping and cross-lingual speaker adaptation.

Context-clustering can be started in the middle of the tree building.

HMgeTool:
Add ECD-based MGE traning command, HMgeTool.

HSMMAlign:
Add stand-alone HSMM based forced-alignment command, HSMMAlign.

Demo scripts:
Change sampling frequency from 16kHz to 48kHz.

Support bark critical-band based aperiodic measure.

Change speaker and singer of Brazilian Portuguese and Japanese song demo, respectively.

Slides:
Release slides as a tutorial of HMM-based speech synthesis.

March 3, 2011

HTS version 2.2 beta was released to the hts-users ML members.

↑

2010†

December 25, 2010

HTS version 2.2 alpha was released to the hts-users ML members.

May 14, 2010

HTS version 2.1.1 was was released.
Its new features are

Based on HTK-3.4.1

Many bug fixes

HFst:
WFST converter for forced-alignment of HSMM

HMGenS:
Initial GV weight for parameter generation

Model-level alignments given from label of singing voice to determine note-level durations

HHEd:
Memory reduction options for context-clustering

Demo scripts
Context-dependent GV without silent and pause phoneme

Demo using the Nitech Japanese database for singing voice synthesis

↑

2009†

December 25, 2009

HTS version 2.1.1 beta was released to the hts-users ML members.

August 27, 2009

The first HTS meeting in Interspeech 2009.

May 22, 2009

HTS-Demo for Brazilian Portuguese is released.

March 16, 2009

Prof. Keiichi Tokuda & Dr. Heiga Zen have a tutorial about HMM-based speech synthesis at Interspeech 2009.

↑

2008†

July 31, 2008

hts_engine website moved to http://hts-engine.sourceforge.net/
hts_engine API version 1.01 and Flite+hts_engine version 0.90 were released.

July 14, 2008

Keiichiro Oura took over the maintainer of HTS from Heiga Zen.

June 27, 2008
HTS version 2.1 and hts_engine API version 1.0 were released.
Their new features are
HTS-2.1
Many bug fixes

Released under the New and Simplified BSD license

Simple documentation

64-bit compile support
MAXSTRLEN (max length of strings), SMAX (max # of streams), and PAT_LEN (max length of patterns) can be set through configure script like
./configure MAXSTRLEN=1024 SMAX=20
HFB:
HSMM training and adaptation

HAdapt:
SMAPLR/CSMAPLR adaptation

HGen:
Speech parameter generation algorithm considering GV

Random generation of state transitions, state durations, and mixture components (by configuration variable RNDFLAGS)

HMGenS:
Speech parameter generation from HSMMs

HHEd:
Add DM command to delete existing macros

Add IT command to impose pre-built trees in clustering

Add JM command to merge difference models on state or stream levels

MU command supports '*2' style mixing up

MU command supports mixture-level occcupancy threshhold in mixing up (by configuration variable MINMIXOCC)
hts_engine API-1.0:
Released under the New and Simplified BSD license

Support LSP-type parameters including LSP, mel-LSP, and MGC-LSP

Speech parameter generation algorithm considering GV

June 13, 2008

HTS version 2.1RC2 and hts_engine API version 0.99 were released to the hts-users ML members.

May 27, 2008

HTS voice building tools for the MARY platform was released with DFKI MARY 3.6.0.

March 24, 2008

HTS version 2.1RC1 and hts_engine API version 0.96 were released to the hts-users ML members.

January 15, 2008

HTS version 2.1beta and hts_engine_API version 0.95 were released to the hts-users ML members.

↑

2007†

December 7, 2007

hts_engine was ported to Java and included in DFKI MARY 3.5.

November 1, 2007

HTS version 2.1alpha was released to the hts-users ML members

October 1, 2007

HTS version 2.0.1 and hts_engine_API version 0.9 were released.
The new features are

Many bug fixes.

Band structure for linear transforms.

Stream-dependent variance flooring scales.

State duration model mmf structure is changed. In the previous versions we used a multi-variate Gaussian PDF to represent state duration PDFs of an HMM. However, from this version we use multi-stream structure. This is very important for the future HSMM support.

Demo scripts support LSP-type parameters for spectral representation in addition to cepstral ones.

API-style implementation of hts_engine. Old stand-alone hts_engine will be thrown away.

September 20, 2007

HTS version 2.0.1RC1 was released to the hts-users ML members.

September 18, 2007

HTS version 2.0.1RC1 was released to the internal working group members.

March 20, 2007

Mail address of hts-users ML changed to hts-users@sp.nitech.ac.jp

March 7, 2007

HTS website moved to http://hts.sp.nitech.ac.jp/

↑

2006†

December 29, 2006
HTS version 2.0 was finally released :-)
The new features are
Based on HTK-3.4.

Compilation without SPTK.

The license of HTS was slightly modified.

Support gcc4.

Thousands of fixed bugs.

HRest can generate state duration densities (-g option).

Model boundaries can be given to HERest (-e option).
We may specify a part of model boundaries (e.g, pause positions).

Reduced-memory implementation of context clustering in HHEd (-r option).
Each decision tree can have a name with regular expression (-p option).
TB 000 {(*-a+*, *-i+*, *-u+*).state[2]}
TB 000 {(*-sil+*, *-pau+*).state[3]}
As a result, different two trees can be constructed for consonants and vowels, respectively.
The interface of HMGenS has been switched from HHEd-style to HERest-style.

Flexible model structures in HMGenS (in the previous version, the first stream is assumed as mcep, and the others are assumed as log F0). Non-left-to-right models and full covariance matrices for state output pdfs can also be used.

EM-based parameter generation algorithm (-c option), i.e, mixture of Gaussians can be used.
-c 0: Cholesky decomposition

-c 1: EM (with fixed state sequence)

-c 2: EM (phone boundaries can be given with -e option)

Random generation algorithm is also supported (set config. variable RNDPG = TRUE).

Speaker adaptation, adaptive training, and semi-tied covariance transforms are supported for multi-stream HMMs/MSD-HMMs.
MLLRMEAN, MLLRCOV, and CMLLR-based adaptation.

CMLLR-based adaptive training.

Decision trees for context clustering can be used to define regression classes for adaptation.

HMGenS can read MLLRMEAN, MLLRCOV, CMLLR, and SEMIT transforms for adaptation.

MAP adaptation is also supported.

Performance improvements in hts_engine.

Miscellaneous changes.

December 4, 2006

HTS version 2.0RC3 was released to members of Mailing List.

July 1, 2006

HTS version 2.0RC2 was released to members of Mailing List.

March 3, 2006

HTS version 2.0RC1 was released to members of Mailing List.

February 15, 2006

HTS version 2.0RC0 was released to the internal working group.

↑

2003†

December 26, 2003

HTS version 1.1.1 was released. The new features were

Based on HTK-3.2.1

Demo script for ARCTIC database

Demo script for an original database (Japanese)

Variance flooring in demo script

Postfiltering in hts-engine

Many fixed bugs

Oct. 14, 2003

New HTS voices trained by ARCTIC databases were released.

June 11, 2003

HTS version 1.1b was released.

May 9, 2003

HTS version 1.1 was released. The new features were

A small synthesis engine (to be called from Festival).

HMM file format converter for the engine.

Many fixed bugs (Thanks for reporting them).

Accompanied by HTS voices for Festival.

January 21, 2003

Minor revision was made to HTS version 1.0.

↑

2002†

December 25, 2002

HTS version 1.0 was released.

↑

1995†

May 9, 1995

The first paper about the speech parameter generation algorithm was appeared in ICASSP'95.

↑

HMM/DNN-based Speech Synthesis System (HTS) - History

History†

2015†

2014†

2013†

2012†

2011†

2010†

2009†

2008†

2007†

2006†

2003†

2002†

1995†

Contents

Links

recent(10)