Publications - HMM/DNN-based speech synthesis system (HTS)

Publications†

This page aims to collect HTS-related publications.
~~If you would like to add your publications to this page, please contact us.~~
This page is frozen.

↑

Basic core techniques†

K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, Proc. of ICASSP, pp.1315-1318, June 2000. pdf
K. Tokuda, T. Mausko, N. Miyazaki, T. Kobayashi, Multi-space probability distribution HMM, IEICE Trans. Inf. & Syst., vol.E85-D, no.3, pp.455-464, March 2002. pdf
T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis, Proc. of Eurospeech, pp.2347-2350, Sept. 1999. pdf correction
T. Yoshimura, Simultaneous modeling of phonetic and prosodic parameters, and characteristic conversion for HMM-based text-to-speech systems, Ph.D thesis, Nagoya Institute of Technology, Jan. 2002. pdf
K. Tokuda, H. Zen, A.W. Black, An HMM-based speech synthesis system applied to English, Proc. of 2002 IEEE SSW, Sept. 2002. pdf
K. Tokuda, H. Zen, A.W. Black, HMM-based approach to multilingual speech synthesis, Text to speech synthesis: New paradigms and advances, S. Narayanan, A. Alwan (Eds.), Prentice Hall, 2004. link
A.W. Black, H. Zen, K. Tokuda, Statistical parametric speech synthesis, Proc. of ICASSP, pp.1229-1232, Apr. 2007. link
H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A.W. Black, K. Tokuda, The HMM-based speech synthesis system version 2.0, Proc. of ISCA SSW6, Bonn, Germany, Aug. 2007. link

↑

Acoustic modeling†

H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Hidden semi-Markov model based speech synthesis, Proc. of ICSLP 2004, vol.II, pp.1397-1400, Oct. 2004. link
H. Zen, K. Tokuda, T. Kitamura, An introduction of trajectory model into HMM-based speech synthesis, Proc. of 5th ISCA Speech Synthesis Workshop, June 2004. link
Y.-J. Wu, R.-H. Wang, Minimum generation error training for HMM-based speech synthesis, Proc. of ICASSP, pp.89-92, 2006. link
Y. Nankaku, H. Zen, K. Tokuda, T. Kitamura, T. Masuko, A Bayesian approach to HMM-based speech synthesis, Tech. rep. of IEICE, vol.103, pp.193-77, 2003 (in Japanese).
H. Zen, Implementing an HSMM-based speech synthesis system using an efficient forward-backward algorithm, Technical Report of Nagoya Institute of Technology, TR-SP-0001, Dec. 2007. link

↑

Speaker adaptation†

J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai ``Analysis of Speaker Adaptation Algorihms for HMM-based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm,'' IEEE Audio, Speech, & Language Processing vol.17 issue 1, pp.66-83, January 2009 link
J. Yamagishi, T. Kobayashi, Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training, IEICE Trans. on Inf. & Syst., vol.E90-D, no.2, pp.533-543, Feb. 2007. link
J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi, A training method of average voice model for HMM-based speech synthesis, IEICE Trans. on Fundamentals, vol.E86-A, no.8, pp.1956-1963, Aug. 2003. link
J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi, A context clustering technique for average voice models, IEICE Trans. Inf. & Syst., vol.E86-D, no.3, pp.534-542, March 2003. link
M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi, Text-to-speech synthesis with arbitrary speaker's voice from average voice, Proc. of Eurospeech, pp.345-348, Sept. 2001. link
M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi, Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR, Proc of ICASSP, pp.805-808, May 2001. link
M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi, Speaker adaptation for HMM-based speech synthesis system using MLLR, Proc. ESCA/COCOSDA Workshop on Speech Synthesis, pp.273-276, Nov. 1998. link
T. Masuko, K. Tokuda, T. Kobayashi, S. Imai, Voice characteristics conversion for HMM-based speech synthesis system, Proc. of ICASSP, pp.1611-1614, Apr. 1997. link
T. Masuko, K. Tokuda, T. Kobayashi, S. Imai, HMM-based speech synthesis with various voice characteristics, Proc. of ASA and ASJ 3rd Joint Meeting, pp.1043-1046, Dec. 1996.
J. Yamagishi, T. Masuko, T. Kobayashi, HMM-based expressive speech synthesis -- Towards TTS with arbitrary speaking styles and emotions, Proc. of Special Workshop in Maui (SWIM), Jan. 2004. pdf
J. Yamagishi, Average-Voice-Based Speech Synthesis, Ph.D thesis, Tokyo Institute of Technology, March 2006. link
L. Qin, Y.-J. Wu, Z.-H. Ling, R.-H. Wang, Improving the performance of HMM-based voice conversion using context clustering decision tree and appropriate regression matrix, Proc. of Interspeech, pp.2250-2253, Sept. 2006. link samples
J. Yamagishi, T. Kobayashi, S. Renals, S. King, H. Zen, T. Toda, K. Tokuda, Improved Average-Voice-based Speech Synthesis using Gender-Mixed Modeling and A Parameter Generation Algorithm considering GV, Proc. ISCA SSW6, Aug. 2007.

↑

Speaker interpolation†

T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, T. Kitamura, Speaker interpolation in HMM-based speech synthesis system, Proc. of Eurospeech, pp.2523-2526, Sept. 1997. link
T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, T. Kitamura, Speaker interpolation for HMM-based speech synthesis system, J. Acoust. Soc. Jpn. (E), vol.21, no.4, pp.199-206, 2000. link

↑

Expressive speech synthesis†

J. Yamagishi, K. Onishi, T. Masuko, T. Kobayashi, Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis, IEICE Trans. on Inf. & Syst., vol.E88-D, no.3, pp.503-509, March 2005. link demos
M. Tachibana, J. Yamagishi, T. Masuko, T. Kobayashi, Speech synthesis with various emotional expressions and speaking styles by style Interpolation and morphing, IEICE Trans. Inf. & Syst., vol.E88-D, no.11, pp.2484-2491, Nov. 2005. link demos
M. Tachibana, J. Yamagishi, T. Masuko, T. Kobayashi, A style adaptation technique for speech synthesis using HSMM and suprasegmental features, IEICE Trans. on Inf. & Syst., vol.E89-D, no.3, pp.1092-1099, March. 2006. link demos
T. Nose, J. Yamagishi, T. Masuko, T. Kobayashi, A style control technique for HMM-based expressive speech synthesis, IEICE Trans. Inf. & Syst., vol.E90-D, no.9, pp.1406-1413, Sept. 2007. link demos
T. Nose, M. Tachibana, T. Kobayashi, HMM-based style control for expressive speech synthesis with arbitrary speaker's voice using model adaptation, IEICE Trans. Inf. & Syst., vol.E92-D, 3, pp.489-497, Mar. 2009. link demos

↑

Eigenvoices†

K. Shichiri, A. Sawabe, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Eigenvoices for HMM-based speech synthesis, Proc. of ICSLP, pp.1269-1272, Sept. 2002. demo

↑

Excitation model†

T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Mixed excitation for HMM-based speech Synthesis, Proc. of Eurospeech, pp.2259-2262, Sept. 2001.
S.-J. Kim, M.-S. Hahn, Two-band excitation for HMM-based speech synthesis, IEICE Trans. Inf. & Syst., vol.E90-D, no.1, pp.378-381, Jan. 2007. link
C. Hemptinne, Integration of the harmonic plus noise model (HNM) into the hidden Markov model-based speech synthesis system (HTS), Master thesis, IDIAP Research Institute, June 2006. link
R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda, An excitation model for HMM-based speech synthesis based on residual modeling, Proc. ISCA SSW6, Aug. 2007. samples
X. Gonzalvo, J.C. Socoro, I. Iriondo, C.Monzo, E. Martinez, Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish, Proc. ISCA SSW6, Aug. 2007. link

↑

Blizzard Challenge†

H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005, IEICE Trans. Inf. & Syst. vol.E90-D, No.1, pp.325-333, Jan. 2007. link
H. Zen, T. Toda, K. Tokuda, The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006, Proc. of Blizzard Challenge 2006 workshop, Sept. 2006. link
H. Zen, T. Toda, An overview of Nitech HMM-based speech synthesis system for Blizzard Challenge 2005, Proc. of Interspeech2005 (Eurospeech), pp.93-96, Sept. 2005. link
J. Yamagishi, H. Zen, T. Toda, K. Tokuda, Speaker-Independent HMM-based Speech Synthesis System - HTS-2007 System for the Blizzard Challenge 2007, Proc. of Blizzard Challenge 2007 workshop, Aug. 2007. link
Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, R.-H. Wang, USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method, Proc. of Blizzard Challenge 2006 workshop, Sept. 2006. link samples
Z.-H. Ling, L. Qin, H. Lu, Y. Gao, L.-R. Dai, R.-H. Wang, Y. Jiang, Z.-W. Zhao, J.-H. Yang, J. Chen, G-P. Hu, The USTC and iFlytek Speech Synthesis Systems for Blizzard Challenge 2007, Proc. of Blizzard Challenge 2007 workshop, Aug. 2007. link
J. Yamagishi, T. Nose, H. Zen, T. Toda, K. Tokuda, Performance Evaluation of The Speaker-Independent HMM-based Speech Synthesis System "HTS-2007" for the Blizzard Challenge 2007, Proc. of ICASSP, Apr. 2008. paper lecture

↑

Practical implementation†

S.-J. Kim, J.-J. Kim, M.-S. Hahn, HMM-based Korean speech synthesis system for hand-held devices, IEEE Trans. Consumer Electronics, vol.52, no.4, pp.1384-1390, Nov. 2006. link

↑

Multilingual†

R. Maia, H. Zen, K. Tokuda, T. Kitamura, F.G. Resende Jr., Towards the development of a Brazilian Portuguese text-to-speech system based on HMM, Proc. of Eurospeech, pp.2465-2468, Sept. 2003.
O. Abdel-Hamid, S. Abdou, M. Rashwan, Improving Arabic HMM based speech synthesis quality, Proc. of Interspeech, pp.1332-1335, 2006.
Y. Qian, F. Soong, Y. Chen, M. Chu, An HMM-based Mandarin Chinese text-to-speech system, Proc. of ISCSLP, Dec. 2006.
S.-J. Kim, J.-J. Kim, M.-S. Hahn, Implementation and evaluation of an HMM-based Korean speech synthesis system, IEICE Trans. Inf. & Syst., vol. E89-D, no.3, pp.1116-1119, 2006. link
C. Weiss, R. Maia, K. Tokuda, W. Hess, Low resource HMM-based speech synthesis applied to German, Proc. of ESSP, 2005.
C. Plahl, Sprachsynthese mit Hidden Markov Modellen, Master thesis, Bielefeld University, 2005 (in German). link
M. Barros, R. Maia, K. Tokuda, D. Freitas, F.G. Resende Jr., HMM-based European Portuguese speech synthesis, Proc. of Interspeech, pp.2581-2584, 2005.
A. Lundgren, An HMM-based text-to-speech system applied to Swedish, Master thesis, Royal Institute of Technology (KTH), 2005. link
T. Ojala, Auditory quality evaluation of present Finnish text-to-speech systems, Master thesis, Helsinki University of Technology, 2006. link
M. Vainio, A. Suni, P. Sirjola, Developing a Finnish concept-to-speech system, Proc. of 2nd Baltic conference on Human Language Technologies, pp.201-206, 2005. link
B. Vesnicer, F. Mihelic, Evaluation of the Slovenian HMM-based speech synthesis system, Proc. of TSD, pp.513-520, 2004. link
M. Homayounpour, S. Mehdi, Farsi speech synthesis using hidden Markov model and decision trees, The CSI Journal on Computer Science and Engineering, vol.2, no.1&3(a), 2004 (in Farsi). link
J. Latorre, K. Iwano, S. Furui, New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer, Speech Communication, vol.48, no.10, pp.1227-1242 Oct. 2006 link.
J. Latorre, K. Iwano, S. Furui, Polyglot synthesis using a mixture of monolingual corpora, Proc. of ICASSP, pp.1-4, 2005.
S. Martincic-Ipsic, I. Ipsic, Croatian HMM-based speech synthesis, Journal of Computing and Information Technology, vol.14, no.4, pp.307-313, 2006. link
X. Gonzalvo, I. Iriondo, J. Socoró, F. Alías, C. Monzo, HMM-based Spanish speech synthesis using CBR as F0 estimator, ISCA Tutorial and Research Workshop on Non Linear Speech Processing (NOLISP07), 2007. link
S. Chomphan, T. Kobayashi, Implementation and Evaluation of an HMM-based Thai Speech Synthesis System, Proc. of Interspeech, 2007. samples
S. Krstulovic, A. Hunecke, M. Schroeder, An HMM-Based Speech Synthesis System applied to German and its Adaptation to a Limited Set of Expressive Football Announcements, Proc. of Interspeech, 2007. link

↑

Singing voice synthesis†

K. Saino, H. Zen, Y. Nankaku, A. Lee, K. Tokuda, HMM-based singing voice synthesis system, Proc. Interspeech, pp.1141-1144, Sept. 2006. samples

↑

Application of HTS†

↑

Hybrid approaches†

H. Kawai, T. Toda, J. Yamagishi, T. Hirai, J. Ni, N. Nishizawa, M. Tsuzaki, K. Tokuda, XIMERA: a concatenative speech synthesis system with large scale corpora, IEICE Trans. J89-D-II, no.12, pp.2688-2698, Dec. 2006 link demo (in Japanese) link
T. Hirai, J. Yamagishi, S. Tenpaku, Utilization of an HMM-Based Feature Generation Module in 5 ms Segment Concatenative Speech Synthesis, Proc. ISCA SSW6, Aug. 2007.

↑

Motion synthesis†

K. Mori, Y. Nankaku, C. Miyajima, K. Tokuda, and T. Kitamura, Motion generation for Japanese finger language based on hidden Markov models, Proc. FIT, pp.569–570, 2005.
N. Niwase, J. Yamagishi, T. Kobayashi, Human Walking Motion Synthesis with Desired Pace and Stride Length Based on HSMM, IEICE Trans. Inf. & Syst. vol.E88-D, No.11, pp.2492-2499, Nov. 2005. link sample
G. Hofer, H. Shimodaira, J. Yamagishi, Speech driven Head Motion Synthesis based on a Trajectory Model, Proc. SIGGRAPH2007 Poster, 2007. sample
O. Govokhina, G. Bailly, G. Breton, and P. Bagshaw, TDA: a new trainable trajectory formation system for facial animation, Proc. Interspeech, pp.1274–1247, 2006.

↑

Handwriting recognition†

L. Ma, Y.J. Wu, P. Liu, and F. Soong, A MSD-HMM approach to pen trajectory modeling for online handwriting recognition, Proc. ICDAR, pp.128–132, 2007.

↑

Audio-visual†

M. Tamura, S. Kondo, T. Masuko, and T. Kobayashi, Text-to-audiovisual speech synthesis based on parameter generation from HMM, Proc. Eurospeech, pp.959–962, 1999.
S. Sako, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, HMM-based text-to-audio-visual speech synthesis, Proc. ICSLP, pp.25–28, 2000.
T. Ishikawa, Y. Sawada, H. Zen, Y. Nankaku, C. Miyajima, K. Tokuda, and T. Kitamura, Audio-visual large vocabulary continuous speech recognition based on early integration, Proc. FIT, pp.203–204, 2002.

↑

ASR†

R. Terashima, T. Yoshimura, T. Wakita, K. Tokuda, and T. Kitamura, An evaluation method of ASR performance by HMM-based speech synthesis, Proc. Spring Meeting of ASJ, pp.159–160, 2003.
K. Emoto, H. Zen, K. Tokuda, and T. Kitamura, Accent type recognition for automatic prosodic labeling, Proc. Autumn Meeting of ASJ, pp.225–226, 2003.
H.L. Wang, Y. Qian, F. Soong, J.L. Zhou, and J.Q. Han, A multi-space distribution (MSD) approach to speech recognition of tonal languages, Proc. of Interspeech, pp.125–128, 2006.
L. Zhang, C. Huang, M. Chu, F. Soong, X. Zhang, and Y. Chen, Automatic detection of tone mispronunciation in Mandarin, Proc. ISCSLP, pp.590–601, 2006.
K. Tanaka, S. Kuroiwa, S. Tsuge, and F. Ren, An acoustic model adaptation using HMM-based speech synthesis, Proc. NLPKE, pp.368–373, 2003.
M. Ishihara, C. Miyajima, N. Kitaoka, K. Itou, and K. Takeda, An approach for training acoustic models based on the vocabulary of the target speech recognition task, Proc. Spring Meeting of ASJ, pp.153–154, 2007.

↑

Feature mapping†

K. Richmond, A trajectory mixture density network for the acoustic-articulatory inversion mapping, Proc. of Interspeech, pp.577–580, 2006.

↑

Speech coding†

T. Hoshiya, S. Sako, H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Improving the performance of HMM-based very low bitrate speech coding, Proc. ICASSP, pp.800–803, 2003.

↑

HMM/DNN-based Speech Synthesis System (HTS) - Publications

Publications†

Basic core techniques†

Acoustic modeling†

Speaker adaptation†

Speaker interpolation†

Expressive speech synthesis†

Eigenvoices†

Excitation model†

Blizzard Challenge†

Practical implementation†

Multilingual†

Singing voice synthesis†

Application of HTS†

Hybrid approaches†

Motion synthesis†

Handwriting recognition†

Audio-visual†

ASR†

Feature mapping†

Speech coding†

Contents

Links

recent(10)