History diff of HTS-2.0 README(No. 2) - HMM/DNN-based speech synthesis system (HTS)

History
View the source.
View the history.
HTS-2.0 README has been deleted.
- 1 (2006-12-26 (Tue) 09:32:53)
- 2 (2006-12-26 (Tue) 09:54:56)
- 3 (2006-12-28 (Thu) 05:58:27)
- 4 (2006-12-28 (Thu) 10:54:55)
- 5 (2006-12-28 (Thu) 15:07:07)
- 6 (2006-12-28 (Thu) 18:48:40)
The added line is THIS COLOR.
The deleted line is THIS COLOR.
 [[HTS-2.0 README]]
 * [[HTS-2.0 README]] [#a4de251a]
 #contents
 
 ================================================================
 	  The HMM-based Speech Synthesis System (HTS)
 	     version 2.0 release December 25, 2006
                                 
 The HMM-Based Speech Synthesis System (HTS)
 (http://hts.ics.nitech.ac.jp/) has been being developed by the
 HTS working group (see "Who we are" below) and others (see
 "Acknowledgments" in the separate file).  The training part of
 HTS was implemented as a modified version of the HTK
 (http://htk.eng.cam.ac.uk/).  Major modifications which we made
 to HTK are listed below:
 ** Release notes [#kb654e7e]
 CENTER:&size(20){The HMM-based Speech Synthesis System (HTS) version 2.0};
 CENTER:&size(20){release December 29, 2006};
 
  - Context clustering based on MDL criterion (instead of ML one)
  - Stream-dependent context clustering
  - Multi-space probability distribution as state output
    probability (for pitch pattern modeling)
  - State duration modeling and clustering
 > The HMM-Based Speech Synthesis System (HTS) (http://hts.ics.nitech.ac.jp/) has been being developed by the HTS working group (see "Who we are" below) and others (see "Acknowledgements" in the separate file).  The training part of HTS was implemented as a modified version of the HTK (http://htk.eng.cam.ac.uk/).  Major modifications which we made to HTK are listed below:
 - Context clustering based on MDL criterion (instead of ML one)
 - Stream-dependent context clustering
 - Multi-space probability distribution as state output probability (for pitch pattern modeling)
 - State duration modeling and clustering
 
 Related publications about the techniques and algorithms used in
 HTS can be found at
 http://hts.ics.nitech.ac.jp/publications.html
 > Related publications about the techniques and algorithms used in HTS can be found at http://hts.ics.nitech.ac.jp/publications.html
 
 The current version does not include any text analyzer but the
 Festival Speech Synthesis System
 (http://www.festvox.org/festival/) can be used as a text
 analyzer.  HTS version 2.0 includes a small run-time synthesis
 engine (less than 1 M byte including HMMs).  Since the synthesis
 engine can run without the HTK library, it is suitable for using
 on the Festival.
 > The current version does not include any text analyzer but the
 Festival Speech Synthesis System (http://www.festvox.org/festival/) can be used as a text analyzer.  HTS version 2.0 includes a small run-time synthesis engine (less than 1 M byte including HMMs).  Since the synthesis engine can run without the HTK library, it is suitable for using on the Festival.
 
 This distribution comes with a demo script using "CMU ARCTIC US
 English slt" (http://www.festvox.org/cmu_arctic/dbs_slt.html),
 which generates "voices" for Festival.
 > This distribution comes with a demo script using "CMU ARCTIC US English slt" (http://www.festvox.org/cmu_arctic/dbs_slt.html), which generates "voices" for Festival.
 
 Six HTS voices for Festival trained by using "CMU ARCTIC
 database" (http://www.festvox.org/cmu_arctic/) are also released
 with HTS version 2.0.  Each of HTS voices consists of HMMs
 trained by the demo script and the small run-time synthesis
 engine, and can be used as a "voice" of Festival Speech
 Synthesis System without any other HTS tools.
 > Six HTS voices for Festival 1.95 & 1.96 trained by using "CMU ARCTIC database" (http://www.festvox.org/cmu_arctic/) are also released with HTS version 2.0.  Each of HTS voices consists of HMMs trained by the demo script, and can be used as a "voice" of Festival Speech Synthesis System without any other HTS tools.
 
 *** Notes for Japanese speech synthesis *** [#nc35f807]
 > A demo script using the Nitech database for speech synthesis "Nitech Jp ATR503 m001" is also prepared for training Japanese voices. Voices trained by the demo script can be used on GalateaTalk, which is a speech synthesis module of an open-source toolkit for anthropomorphic spoken dialogue agents developed in Galatea project (http://hil.t.u-tokyo.ac.jp/~galatea/), without any other HTS tools.
 
 A demo script using the NIT database for speech synthesis "NIT
 JP ATR503 m001" is also prepared for training Japanese voices.
 > An HTS voice for GalateaTalk trained by the demo script is also released with HTS version 2.0.
 
 Voices trained by the demo script can be used on GalateaTalk,
 which is a speech synthesis module of an open-source toolkit for
 anthropomorphic spoken dialogue agents developed in Galatea
 project (http://hil.t.u-tokyo.ac.jp/~galatea/), without any
 other HTS tools.
 
 An HTS voice for GalateaTalk trained by the demo script is also
 released with HTS version 2.0.
 ** What's new in version 2.0 [#n0947bd6]
 - Based on [[HTK-3.4>http://htk.eng.cam.ac.uk/download.shtml]]
 - Compilation without [[SPTK>http://kt-lab.ics.nitech.ac.jp/~tokuda/SPTK/index.html]]
 - Thousands of fixed bugs
 - HRest can generate state duration densities (-g option)
 - Model boundaries can be given to HERest (-e option)~
 We may specify a part of model boundaries (e.g, pause positions)
 - Reduced-memory implementation of context clustering in HHEd (-r option)
 - Each decision tree can have a name with regular expression (-p option)~
  TB 000 {(*-a+*, *-i+*, *-u+*).state[2]}
  TB 000 {(*-sil+*, *-pau+*).state[3]}
 As a result, different two trees can be constructed for consonants and vowels, respectively.
 - The interface of HMGenS has been switched from ''HHEd-style'' to ''HERest-style''
 - Flexible model structures in HMGenS (in the previous version, the first stream is assumed as mcep, and the others are assumed as log f0).  Non-left-to-right models and full covariance matrices for state output pdfs can also be used.
 - EM-based parameter generation algorithm (-c option), i.e, mixture of Gaussians can be used. 
 -- -c 0: Cholesky decomposition
 -- -c 1: EM (with fixed state sequence)
 -- -c 2: EM (phone boundaries can be given with -e option)
 - Random generation algorithm (set config. variable RNDPG = TRUE)
 - Speaker adaptation, adaptive training, and semi-tied covariance transforms are supported for multi-stream HMMs/MSD-HMMs. 
 -- MLLRMEAN, MLLRCOV, and CMLLR-based adaptation.
 -- CMLLR-based adaptive training.
 -- Decision trees for context clustering can be used for definition of regression classes for adaptation.
 -- HMGenS can read MLLRMEAN, MLLRCOV, CMLLR, and SEMIT transforms for adaptation.
 - MAP adaptation is also supported.
 - Performance improvements in hts_engine 
 - Miscellaneous changes.
 
 **************************************************************** [#a5992994]
 		   What's new in version 2.0
 **************************************************************** [#q119df7e]
 ** Copying [#l9acd2a9]
 > The basic core system of [[HTS version 2.0:http://hts.ics.nitech.ac.jp/]] is released as a patch code to [[HTK version 3.4:http://htk.eng.cam.ac.uk/]].  The patch code is released under a free license, without commercial restrictions.  However, it should be noted that once you apply the patch to the HTK source code, you must obey the license of HTK.
 
  * Based on HTK-3.4alpha
  * Compilation without SPTK
  * Thousands of fixed bugs
  * HCompV calculates Variance floor in double.
  * HRest can generate state duration density (-g option).
  * Phone boundaries can be given to HERest (-e option).
    We may specify a part of phone boundaries, e.g, pause positions.
  * Reduced-memory implementation of context clustering of HHEd
    (-r option).
  * Each decision tree can have a name with regular expression
    (-p option).
      ex. TB 000 {(*-a+*, *-i+*, *-u+*).state[2]}
    As a result, deferent two trees can be constructed for
    consonants and vowels, respectively.
  * The interface of HMGenS has been changed from "HHEd-like" to
    "HERest-like."
  * Flexible model structures in HMGenS (in the previous version,
    the first stream is assumed as mcep, and the others are
    assumed as log f0).  Non-left-to-right models and full
    covariance matrices for state Gaussians can also be used.
  * EM-based parameter generation algorithm (-c option), i.e,
    multi-mixture model can be used.
      -c 0: Cholesky decomposition
      -c 1: EM (with fixed state sequence)
      -c 2: EM (Phone boundaries can be given with -e option)
  * Speaker adaptation is supported for MSD-streams (for f0
    modeling).
     - MLLRMEAN, MLLRCOV, and CMLLR are supported.  SAT
       (CMLLR-based) is also supported.
     - Decision trees for clustering can be used for definition
       of regression classes for adaptation.
     - HMGenS can read regression matrices for adaptation.
  * miscellaneous changes
 > Although the patch code is free, we still offer no warranties and no maintenance.  We will continue to endeavor to fix bugs and answer queries when can, but are not in a position to guarantee it.  We will consider consultancy if desired, please contacts us for details.
 
 **************************************************************** [#k2ada98d]
                             Copying
 **************************************************************** [#zc42822f]
 > If you are using HTS version 2.0 in commercial environment, even though no license is required, we would be grateful if you let us know as it helps justify ourselves to our various sponsors.
 
 The basic core system of HMM-Based Speech Synthesis System (HTS)
 version 2.0 (http://hts.ics.nitech.ac.jp/) is released as a
 patch code to HTK version 3.4 (http://htk.eng.cam.ac.uk/).  The
 patch code is released under a free license, without commercial
 restrictions.  However, it should be noted that once you apply
 the patch to the HTK source code, you must obey the license of
 HTK.
 
 Although the patch code is free, we still offer no warranties
 and no maintenance.  We will continue to endeavor to fix bugs
 and answer queries when can, but are not in a position to
 guarantee it.  We will consider consultancy if desired, please
 contacts us for details.
 
 If you are using HMM-based Speech Synthesis System (HTS) version
 2.0 in commercial environment, even though no license is
 required, we would be grateful if you let us know as it helps
 justify ourselves to our various sponsors.
 
 The current copyright on the core system is
 
 ----------------------------------------------------------------
 	  The HMM-Based Speech Synthesis System (HTS)
 > The current copyright on the core system is
               The HMM-Based Speech Synthesis System (HTS)
                       HTS Working Group
 
  
                  Department of Computer Science
                  Nagoya Institute of Technology
                               and
   Interdisciplinary Graduate School of Science and Engineering
                  Tokyo Institute of Technology
 
  
                     Copyright (c) 2001-2006
                       All Rights Reserved.
                                                                        
 Permission is hereby granted, free of charge, to use and
 distribute this software in the form of patch code to HTK and
 its documentation without restriction, including without
 limitation the rights to use, copy, modify, merge, publish,
 distribute, sublicense, and/or sell copies of this work, and to
 permit persons to whom this work is furnished to do so, subject
 to the following conditions:
 
  Permission is hereby granted, free of charge, to use and
  distribute this software in the form of patch code to HTK and
  its documentation without restriction, including without
  limitation the rights to use, copy, modify, merge, publish,
  distribute, sublicense, and/or sell copies of this work, and to
  permit persons to whom this work is furnished to do so, subject
  to the following conditions:
  
   1. Once you apply the HTS patch to HTK, you must obey the
      license of HTK.
 
  
   2. The code must retain the above copyright notice, this list
      of conditions and the following disclaimer.
 
  
   3. Any modifications must be clearly marked as such.
  
  NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF TECHNOLOGY,
  HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM
  ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL
  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT
  SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF
  TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE
  FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
  DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
  WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTUOUS
  ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
  PERFORMANCE OF THIS SOFTWARE.
 
 NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF TECHNOLOGY,
 HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM
 ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT
 SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF
 TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE
 FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
 DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
 WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTUOUS
 ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
 PERFORMANCE OF THIS SOFTWARE.
 ----------------------------------------------------------------
 
 Several tools in HTS version 2.0 are independent of HTK (though
 most of them use the HTK library).  The copyright of these tools
 is
 
 ----------------------------------------------------------------
 	  The HMM-Based Speech Synthesis System (HTS)
 > Several tools in HTS version 2.0 are independent of HTK (though most of them use the HTK library).  The copyright of these tools is
  	  The HMM-Based Speech Synthesis System (HTS)
                       HTS Working Group
 
  
                  Department of Computer Science
                  Nagoya Institute of Technology
                               and
   Interdisciplinary Graduate School of Science and Engineering
                  Tokyo Institute of Technology
 
  
                     Copyright (c) 2001-2006
                       All Rights Reserved.
 
 Permission is hereby granted, free of charge, to use and
 distribute this software and its documentation without
 restriction, including without limitation the rights to use,
 copy, modify, merge, publish, distribute, sublicense, and/or
 sell copies of this work, and to permit persons to whom this
 work is furnished to do so, subject to the following conditions:
 
  
  Permission is hereby granted, free of charge, to use and
  distribute this software and its documentation without
  restriction, including without limitation the rights to use,
  copy, modify, merge, publish, distribute, sublicense, and/or
  sell copies of this work, and to permit persons to whom this
  work is furnished to do so, subject to the following conditions: 
  
   1. The code must retain the above copyright notice, this list
      of conditions and the following disclaimer.
 
  
   2. Any modifications must be clearly marked as such.
 
  
   3. Commercial products derived from this software must retain
      the above copyright notice, and the following disclaimer.
      Otherwise, one must contact the HTS working group.
  
  NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF TECHNOLOGY,
  HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM
  ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL
  IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT
  SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF
  TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE
  FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
  DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
  WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTUOUS
  ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
  PERFORMANCE OF THIS SOFTWARE.
 
 NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF TECHNOLOGY,
 HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM
 ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT
 SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF
 TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE
 FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
 DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
 WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTUOUS
 ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
 PERFORMANCE OF THIS SOFTWARE.
 ----------------------------------------------------------------
 
 **************************************************************** [#zb419566]
                           Installation
 **************************************************************** [#l2c57c18]
 
 See the file "INSTALL" in the root directory.  Note that HTS
 ** Installation [#s9493534]
 > See the file "INSTALL" in the root directory.  Note that HTS
 needs HTK.
HMM/DNN-based Speech Synthesis System (HTS) - History diff of HTS-2.0 README (No. 2)