History diff of Welcome vs current(No. 1) - HMM/DNN-based speech synthesis system (HTS)

History
View the diff.
View the source.
View the history.
Go to Welcome.
- 1 (2020-08-06 (Thu) 12:12:24)
The added line is THIS COLOR.
The deleted line is THIS COLOR.
* Welcome! [#k4f3be02]
 > The [[HMM/DNN-based Speech Synthesis System (HTS)>http://hts.sp.nitech.ac.jp/]] has been developed by the HTS working group and others (see [[Who we are]] and [[Acknowledgments]]).  The training part of HTS has been implemented as a modified version of [[HTK:http://htk.eng.cam.ac.uk/]] and released as a form of patch code to HTK.  The patch code is released under a free software license.  However, it should be noted that &color(red){once you apply the patch to HTK, you must obey the [[license of HTK:http://htk.eng.cam.ac.uk/docs/license.shtml]].};
 Related publications about the techniques and algorithms used in HTS can be
 found [[here>Publications]].
 
 // 2.3
 
 > HTS version 2.3 includes VBLR speaker adaptation, DAEM-based parameter generation algorithm, and other minor new features.
 Many bugs in HTS version 2.2 were also fixed.
 HTS does not include any text analyzers but the [[Festival Speech Synthesis System>http://www.festvox.org/festival/]] (English, Spanish, etc.), [[DFKI MARI Text-to-Speech System>http://mary.dfki.de/]] (German, English, etc.), [[Flite+hts_engine>http://hts-engine.sourceforge.net]] (English), [[Open JTalk>http://open-jtalk.sourceforge.net/]] (Japanese), or other text analyzers can be used with HTS.
 HTS slides are also released as a tutorial of HMM-based speech synthesis.
 
 > This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database>http://www.festvox.org/cmu_arctic/]] (English).
 For training other voices, demo scripts using NITech database (Portuguese, Japanese, and Japanese song) are also released.
 
 > In addition, HTS version 2.3.1 demo scripts support frame-by-frame modeling option using &color(red){DNN (deep neural network)}; based on HMM state alignment.
 
 // 2.2
 
 //> HTS version 2.2 includes deterministic annealing EM algorithm in parameter estimation step, KLD-based state-mapping and cross-lingual speaker adaptation, minimum generation error (MGE) training, and other minor new features.
 //Many bugs in HTS version 2.1.1 were also fixed.
 //HTS does not include any text analyzers but the [[Festival Speech Synthesis System>http://www.festvox.org/festival/]] (English, Spanish, etc.), [[DFKI MARI Text-to-Speech System>http://mary.dfki.de/]] (German, English, etc.), [[Flite+hts_engine>http://hts-engine.sourceforge.net]] (English), [[Open JTalk>http://open-jtalk.sourceforge.net/]] (Japanese), or other text analyzers can be used with HTS.
 //HTS slides are also released as a tutorial of HMM-based speech synthesis.
 
 //> This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database>http://www.festvox.org/cmu_arctic/]] (English).
 //For training other voices, demo scripts using Nitech database (Portuguese, Japanese, and Japanese Song) are also released.
 
 // 2.1.1
 
 //> HTS version 2.1.1 is based on HTK-3.4.1 and includes forced-alignment of hidden semi-Markov model (HSMM) and other minor new features.
 //Many bugs in HTS version 2.1 were also fixed.
 //HTS does not include any text analyzers but the [[Festival Speech Synthesis System:http://www.festvox.org/festival/]], [[DFKI MARY Text-to-Speech System:http://mary.dfki.de/]], [[Flite+hts_engine>http://hts-engine.sourceforge.net]], or other text analyzers can be used with HTS.
 //This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database:http://www.festvox.org/cmu_arctic/]] (English).
 
 //> For training Japanese voices, a demo script using the Nitech database is also prepared.  Japanese voices trained by the demo script can be used on [[Open JTalk>http://open-jtalk.sourceforge.net/]], which is a Japanese text-to-speech synthesis.
 
 // 2.1
 
 //> HTS version 2.1 includes hidden semi-Markov model (HSMM) training/adaptation/synthesis, speech parameter generation algorithm considering global variance (GV), SMAPLR/CSMAPLR adaptation, and other minor new features.  Many bugs in HTS version 2.0.1 were also fixed.  The API for runtime synthesis module, hts_engine API, version 1.0 was also released.  Because hts_engine can run without the HTK library, users can develop their own open or proprietary softwares based on hts_engine.  HTS and hts_engine API does not include any text analyzers but the [[Festival Speech Synthesis System:http://www.festvox.org/festival/]], [[DFKI MARY Text-to-Speech System:http://mary.dfki.de/]], or other text analyzers can be used with HTS.  This distribution includes demo scripts for training speaker-dependent and speaker-adaptive systems using [[CMU ARCTIC database:http://www.festvox.org/cmu_arctic/]] (English).
 //Six HTS voices for Festival 1.96 are also released.  They use the hts_engine //module included in Festival.  Each of HTS voices can be used without any other HTS tools.
 
 //> For training Japanese voices, a demo script using the Nitech database is also prepared.  Japanese voices trained by the demo script can be used on [[GalateaTalk:http://hil.t.u-tokyo.ac.jp/~galatea/]], which is a speech synthesis module of an open-source toolkit for anthropomorphic spoken dialogue agents developed in [[Galatea project:http://hil.t.u-tokyo.ac.jp/~galatea/]].  An HTS voice for Galatea trained by the demo script is also released.
 
 * News! [#ve28e7f9]
 
 - ''March 12, 2021''
 > The code to train DNN-HSMM for text-to-speech synthesis was released.~
 DNN-HSMM maps phoneme(state)-level linguistic features into hidden-semi Markov model parameters.~
 - The code:
 -- Supports model training based on a maximum likelihood criterion.
 -- Supports maximum likelihood parameter generation (MLPG).
 
 - ''December 25, 2017''
 > HTS version 2.3.2 was released.~
 Its new features are
 - Demo scripts:
 -- Add trajectory training considering global variance based on DNN (deep neural network).
 -- Add speaker adaptive training for DNN. (It trains the connection weights of the whole DNN for each speaker.)
 
 - ''December 25, 2016''
 > HTS version 2.3.1 was released.~
 Its new features are
 - Demo scripts:
 -- Add frame-by-frame modeling option using DNN (deep neural network) based on HMM state alignment.
 
 - ''December 25, 2015''
 > HTS version 2.3 was released.~
 Its new features are
 - HERest:
 -- Add VBLR adaptation.
 - HMGenS:
 -- Add DAEM-based parameter generation.
 -- Support DP search to determine state duration when the model alignments are given.
 - HInit, HRest, HRest:
 -- Support parallel mode.
 - HHEd:
 -- Speed up context-clustering by calculating differences between answers to current and previous questions.
 -- Add untying weights function in HHEd.
 - Demo scripts:
 -- Add modulation spectrum-based postfilter.
 -- Support text files instead of utt files for general English database.
 -- Turn off spectrum normalization in STRAIGHT.
 -- Add LSP postfilter.
 -- Support mel-cepstrum based aperiodic measure generated by STRAIGHT.
 -- Support new HTS voice format for hts engine API.
 -- Integrate normal demo and STRAIGHT demo.
 
 - ''December 25, 2014''
 > HTS version 2.3 beta was released to the hts-users ML members.
 
 - ''May 1, 2013''
 > A tutorial about HMM-based speech synthesis was published on Proceedings of the IEEE: http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6495700
 
 - ''December 25, 2012''
 > HTS version 2.3 alpha was released to the hts-users ML members.
 
 //- ''July 7, 2011''
 //> HTS version 2.2 was released.~
 //Its new features are
 //- HERest:
 //-- Support DAEM algorithm in parameter estimation step.
 //- HHEd:
 //-- Support KLD-based state-mapping and cross-lingual speaker adaptation.
 //-- Context-clustering can be started in the middle of the tree building.
 //- HMgeTool:
 //-- Add ECD-based MGE traning command, HMgeTool.
 //- HSMMAlign:
 //-- Add stand-alone HSMM based forced-alignment command, HSMMAlign.
 //- Demo scripts:
 //-- Change sampling frequency from 16kHz to 48kHz.
 //-- Support bark critical-band based aperiodic measure.
 //-- Change speaker and singer of Brazilian Portuguese and Japanese song demo, respectively.
 //- Slides:
 //-- Release slides as a tutorial of HMM-based speech synthesis.
 
 //- ''March 3, 2011''
 //> HTS version 2.2 beta was released to the hts-users ML members.
 //- ''December 25, 2010''
 //> HTS version 2.2 alpha was released to the hts-users ML members.
 //- ''May 14, 2010''
 //> HTS version 2.1.1 was was released.~
 //Its new features are
 //- Based on HTK-3.4.1
 //- Many bug fixes
 //- HFst:
 //-- WFST converter for forced-alignment of HSMM
 //- HMGenS:
 //-- Initial GV weight for parameter generation
 //-- Model-level alignments given from label of singing voice to determine note-level durations
 //- HHEd:
 //-- Memory reduction options for context-clustering
 //- Demo scripts:
 //-- Context-dependent GV without silent and pause phoneme
 //-- Demo using the Nitech Japanese database for singing voice synthesis
 
 //- ''December 25, 2009''
 //> HTS version 2.1.1 beta was released to the hts-users ML members.
 
 //- ''August 27, 2009''
 //> The first HTS meeting in [[Interspeech 2009:http://www.interspeech2009.org/conference/]].
 
 //- ''May 22, 2009''
 //> HTS-Demo for Brazilian Portuguese is released.
 
 // - ''March 16, 2009''
 // > Prof. Keiichi Tokuda & Dr. Heiga Zen have a [[tutorial about HMM-based speech synthesis>Tutorial]] at [[Interspeech 2009:http://www.interspeech2009.org/conference/]].  
 
 //- ''July 31, 2008''
 //> The API of runtime synthesis engine, hts_engine API, was splitted from HTS itself and moved to [[SourceForge:http://hts-engine.sourceforge.net/]].~
 // hts_engine API version 1.01 and Flite+hts_engine version 0.90 were released.
 
 //- ''July 14, 2008''
 //> [[Keiichiro Oura:http://www.sp.nitech.ac.jp/~uratec/]] took over the //maintainer of HTS from [[Heiga Zen:http://www.sp.nitech.ac.jp/~zen/]].
 //- ''June 27, 2008''
 
 //> HTS version 2.1 and hts_engine API version 1.0 were released.~
 //Their new features are
 //- HTS-2.1
 //-- Many bug fixes
 //-- Released under the [[New and Simplified BSD //license:http://www.opensource.org/]]
 //-- Simple documentation
 //-- 64-bit compile support
 //-- MAXSTRLEN (max length of strings), SMAX (max # of streams), and PAT_LEN //(max length of patterns) can be set through configure script like
 // ./configure MAXSTRLEN=1024 SMAX=20
 //-- HFB:
 //--- HSMM training and adaptation
 //-- HAdapt:
 //--- SMAPLR/CSMAPLR adaptation
 //-- HGen:
 //--- Speech parameter generation algorithm considering GV
 //--- Random generation of state transitions, state durations, and mixture //components (by configuration variable RNDFLAGS)
 //-- HMGenS:
 //--- Speech parameter generation from HSMMs
 //-- HHEd:
 //--- Add DM command to delete existing macros
 //--- Add IT command to impose pre-built trees in clustering
 //--- Add JM command to merge difference models on state or stream levels
 //--- MU command supports '*2' style mixing up
 //--- MU command supports mixture-level occupancy threshold in mixing up (by //configuration variable MINMIXOCC)
 //- hts_engine API-1.0:
 //-- Released under the [[New and Simplified BSD //license:http://www.opensource.org/]]
 //-- Support LSP-type parameters including LSP, mel-LSP, and MGC-LSP
 //-- Speech parameter generation algorithm considering GV
 
 //- ''June 13, 2008''
 //> HTS version 2.1RC2 and hts_engine API version 0.99 were released to the hts-//users ML members.~
 //See [[here:http://hts.sp.nitech.ac.jp/hts-users/spool/2008/msg00336.html]] for //details.
 
 // - ''May 27, 2008''
 // > HTS voice building tools for the MARY platform was released with [[DFKI MARY 3.6.0:http://mary.dfki.de/Download/mary-3-6-0-released]].
 // 
 // - ''March 24, 2008''
 // > HTS version 2.1RC1 and hts_engine API version 0.96 were released to the hts-users ML members. See [[here:http://hts.sp.nitech.ac.jp/hts-users/spool/2008/msg00175.html]] for details.
 
 // - ''January 15, 2008''
 // > HTS version 2.1beta and hts_engine API version 0.95 were released to the hts-users ML members.
 
 // - ''December 7, 2007''
 // > hts_engine was ported to Java and included in [[DFKI MARY 3.5:http://mary.dfki.de/Download/mary-3-5-0-released]]. 
 
 // - ''November 1, 2007''
 // > HTS version 2.1alpha was released to the hts-users ML members.
 // - ''October 1, 2007''
 // > HTS version 2.0.1 and hts_engine_API version 0.9 were released.~
 // The new features are
 // - Many bug fixes.
 // - Band structure for linear transforms.
 // - Stream-dependent variance flooring scales.
 // - State duration model mmf structure is changed.  In the previous versions we // used a multi-variate Gaussian PDF to represent state duration PDFs of an HMM.     // However, from this version we use multi-stream structure.  This is very important for the future HSMM support.
 // - Demo scripts support LSP-type parameters for spectral representation in addition to cepstral ones.
 // - API-style implementation of hts_engine.  Old stand-alone hts_engine will be thrown away.
 
 // - ''September 20, 2007''
 // > HTS version 2.0.1RC1 was released to the hts-users ML members.
 
 // - ''September 18, 2007''
 // > HTS version 2.0.1RC1 was released to the internal working group members.
HMM/DNN-based Speech Synthesis System (HTS) - History diff of Welcome vs current(No. 1)