History source of HTS-2.0 README(No. 1) - HMM/DNN-based speech synthesis system (HTS)

History
View the history.
HTS-2.0 README has been deleted.
- 1 (2006-12-26 (Tue) 09:32:53)
- 2 (2006-12-26 (Tue) 09:54:56)
- 3 (2006-12-28 (Thu) 05:58:27)
- 4 (2006-12-28 (Thu) 10:54:55)
- 5 (2006-12-28 (Thu) 15:07:07)
- 6 (2006-12-28 (Thu) 18:48:40)
[[HTS-2.0 README]]

================================================================
	  The HMM-based Speech Synthesis System (HTS)
	     version 2.0 release December 25, 2006
                                
The HMM-Based Speech Synthesis System (HTS)
(http://hts.ics.nitech.ac.jp/) has been being developed by the
HTS working group (see "Who we are" below) and others (see
"Acknowledgments" in the separate file).  The training part of
HTS was implemented as a modified version of the HTK
(http://htk.eng.cam.ac.uk/).  Major modifications which we made
to HTK are listed below:

 - Context clustering based on MDL criterion (instead of ML one)
 - Stream-dependent context clustering
 - Multi-space probability distribution as state output
   probability (for pitch pattern modeling)
 - State duration modeling and clustering

Related publications about the techniques and algorithms used in
HTS can be found at
http://hts.ics.nitech.ac.jp/publications.html

The current version does not include any text analyzer but the
Festival Speech Synthesis System
(http://www.festvox.org/festival/) can be used as a text
analyzer.  HTS version 2.0 includes a small run-time synthesis
engine (less than 1 M byte including HMMs).  Since the synthesis
engine can run without the HTK library, it is suitable for using
on the Festival.

This distribution comes with a demo script using "CMU ARCTIC US
English slt" (http://www.festvox.org/cmu_arctic/dbs_slt.html),
which generates "voices" for Festival.

Six HTS voices for Festival trained by using "CMU ARCTIC
database" (http://www.festvox.org/cmu_arctic/) are also released
with HTS version 2.0.  Each of HTS voices consists of HMMs
trained by the demo script and the small run-time synthesis
engine, and can be used as a "voice" of Festival Speech
Synthesis System without any other HTS tools.

*** Notes for Japanese speech synthesis *** [#nc35f807]

A demo script using the NIT database for speech synthesis "NIT
JP ATR503 m001" is also prepared for training Japanese voices.

Voices trained by the demo script can be used on GalateaTalk,
which is a speech synthesis module of an open-source toolkit for
anthropomorphic spoken dialogue agents developed in Galatea
project (http://hil.t.u-tokyo.ac.jp/~galatea/), without any
other HTS tools.

An HTS voice for GalateaTalk trained by the demo script is also
released with HTS version 2.0.

**************************************************************** [#a5992994]
		   What's new in version 2.0
**************************************************************** [#q119df7e]

 * Based on HTK-3.4alpha
 * Compilation without SPTK
 * Thousands of fixed bugs
 * HCompV calculates Variance floor in double.
 * HRest can generate state duration density (-g option).
 * Phone boundaries can be given to HERest (-e option).
   We may specify a part of phone boundaries, e.g, pause positions.
 * Reduced-memory implementation of context clustering of HHEd
   (-r option).
 * Each decision tree can have a name with regular expression
   (-p option).
     ex. TB 000 {(*-a+*, *-i+*, *-u+*).state[2]}
   As a result, deferent two trees can be constructed for
   consonants and vowels, respectively.
 * The interface of HMGenS has been changed from "HHEd-like" to
   "HERest-like."
 * Flexible model structures in HMGenS (in the previous version,
   the first stream is assumed as mcep, and the others are
   assumed as log f0).  Non-left-to-right models and full
   covariance matrices for state Gaussians can also be used.
 * EM-based parameter generation algorithm (-c option), i.e,
   multi-mixture model can be used.
     -c 0: Cholesky decomposition
     -c 1: EM (with fixed state sequence)
     -c 2: EM (Phone boundaries can be given with -e option)
 * Speaker adaptation is supported for MSD-streams (for f0
   modeling).
    - MLLRMEAN, MLLRCOV, and CMLLR are supported.  SAT
      (CMLLR-based) is also supported.
    - Decision trees for clustering can be used for definition
      of regression classes for adaptation.
    - HMGenS can read regression matrices for adaptation.
 * miscellaneous changes

**************************************************************** [#k2ada98d]
                            Copying
**************************************************************** [#zc42822f]

The basic core system of HMM-Based Speech Synthesis System (HTS)
version 2.0 (http://hts.ics.nitech.ac.jp/) is released as a
patch code to HTK version 3.4 (http://htk.eng.cam.ac.uk/).  The
patch code is released under a free license, without commercial
restrictions.  However, it should be noted that once you apply
the patch to the HTK source code, you must obey the license of
HTK.

Although the patch code is free, we still offer no warranties
and no maintenance.  We will continue to endeavor to fix bugs
and answer queries when can, but are not in a position to
guarantee it.  We will consider consultancy if desired, please
contacts us for details.

If you are using HMM-based Speech Synthesis System (HTS) version
2.0 in commercial environment, even though no license is
required, we would be grateful if you let us know as it helps
justify ourselves to our various sponsors.

The current copyright on the core system is

----------------------------------------------------------------
	  The HMM-Based Speech Synthesis System (HTS)
                      HTS Working Group

                 Department of Computer Science
                 Nagoya Institute of Technology
                              and
  Interdisciplinary Graduate School of Science and Engineering
                 Tokyo Institute of Technology

                    Copyright (c) 2001-2006
                      All Rights Reserved.
                                                                       
Permission is hereby granted, free of charge, to use and
distribute this software in the form of patch code to HTK and
its documentation without restriction, including without
limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of this work, and to
permit persons to whom this work is furnished to do so, subject
to the following conditions:

  1. Once you apply the HTS patch to HTK, you must obey the
     license of HTK.

  2. The code must retain the above copyright notice, this list
     of conditions and the following disclaimer.

  3. Any modifications must be clearly marked as such.

NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF TECHNOLOGY,
HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM
ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT
SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF
TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTUOUS
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE.
----------------------------------------------------------------

Several tools in HTS version 2.0 are independent of HTK (though
most of them use the HTK library).  The copyright of these tools
is

----------------------------------------------------------------
	  The HMM-Based Speech Synthesis System (HTS)
                      HTS Working Group

                 Department of Computer Science
                 Nagoya Institute of Technology
                              and
  Interdisciplinary Graduate School of Science and Engineering
                 Tokyo Institute of Technology

                    Copyright (c) 2001-2006
                      All Rights Reserved.

Permission is hereby granted, free of charge, to use and
distribute this software and its documentation without
restriction, including without limitation the rights to use,
copy, modify, merge, publish, distribute, sublicense, and/or
sell copies of this work, and to permit persons to whom this
work is furnished to do so, subject to the following conditions:

  1. The code must retain the above copyright notice, this list
     of conditions and the following disclaimer.

  2. Any modifications must be clearly marked as such.

  3. Commercial products derived from this software must retain
     the above copyright notice, and the following disclaimer.
     Otherwise, one must contact the HTS working group.

NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF TECHNOLOGY,
HTS WORKING GROUP, AND THE CONTRIBUTORS TO THIS WORK DISCLAIM
ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT
SHALL NAGOYA INSTITUTE OF TECHNOLOGY, TOKYO INSTITUTE OF
TECHNOLOGY, HTS WORKING GROUP, NOR THE CONTRIBUTORS BE LIABLE
FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY
DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTUOUS
ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
PERFORMANCE OF THIS SOFTWARE.
----------------------------------------------------------------

**************************************************************** [#zb419566]
                          Installation
**************************************************************** [#l2c57c18]

See the file "INSTALL" in the root directory.  Note that HTS
needs HTK.
HMM/DNN-based Speech Synthesis System (HTS) - History source of HTS-2.0 README (No. 1)