Publications†
This page aims to collect HTS-related publications.
If you would like to add your publications to this page, please contact us.
This page is frozen.
Basic core techniques†
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, T. Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, Proc. of ICASSP, pp.1315-1318, June 2000.
pdf
- K. Tokuda, T. Mausko, N. Miyazaki, T. Kobayashi, Multi-space probability distribution HMM, IEICE Trans. Inf. & Syst., vol.E85-D, no.3, pp.455-464, March 2002.
pdf
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis, Proc. of Eurospeech, pp.2347-2350, Sept. 1999.
pdf
correction
- T. Yoshimura, Simultaneous modeling of phonetic and prosodic parameters, and characteristic conversion for HMM-based text-to-speech systems, Ph.D thesis, Nagoya Institute of Technology, Jan. 2002.
pdf
- K. Tokuda, H. Zen, A.W. Black, An HMM-based speech synthesis system applied to English, Proc. of 2002 IEEE SSW, Sept. 2002.
pdf
- K. Tokuda, H. Zen, A.W. Black, HMM-based approach to multilingual speech synthesis, Text to speech synthesis: New paradigms and advances, S. Narayanan, A. Alwan (Eds.), Prentice Hall, 2004.
link
- A.W. Black, H. Zen, K. Tokuda, Statistical parametric speech synthesis, Proc. of ICASSP, pp.1229-1232, Apr. 2007.
link
- H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A.W. Black, K. Tokuda, The HMM-based speech synthesis system version 2.0, Proc. of ISCA SSW6, Bonn, Germany, Aug. 2007.
link
Acoustic modeling†
- H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Hidden semi-Markov model based speech synthesis, Proc. of ICSLP 2004, vol.II, pp.1397-1400, Oct. 2004.
link
- H. Zen, K. Tokuda, T. Kitamura, An introduction of trajectory model into HMM-based speech synthesis, Proc. of 5th ISCA Speech Synthesis Workshop, June 2004.
link
- Y.-J. Wu, R.-H. Wang, Minimum generation error training for HMM-based speech synthesis, Proc. of ICASSP, pp.89-92, 2006.
link
- Y. Nankaku, H. Zen, K. Tokuda, T. Kitamura, T. Masuko, A Bayesian approach to HMM-based speech synthesis, Tech. rep. of IEICE, vol.103, pp.193-77, 2003 (in Japanese).
- H. Zen, Implementing an HSMM-based speech synthesis system using an efficient forward-backward algorithm, Technical Report of Nagoya Institute of Technology, TR-SP-0001, Dec. 2007.
link
Speaker adaptation†
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai
``Analysis of Speaker Adaptation Algorihms for HMM-based Speech Synthesis and a Constrained SMAPLR Adaptation Algorithm,''
IEEE Audio, Speech, & Language Processing vol.17 issue 1, pp.66-83, January 2009
link
- J. Yamagishi, T. Kobayashi, Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training, IEICE Trans. on Inf. & Syst., vol.E90-D, no.2, pp.533-543, Feb. 2007.
link
- J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi, A training method of average voice model for HMM-based speech synthesis, IEICE Trans. on Fundamentals, vol.E86-A, no.8, pp.1956-1963, Aug. 2003.
link
- J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi, A context clustering technique for average voice models, IEICE Trans. Inf. & Syst., vol.E86-D, no.3, pp.534-542, March 2003.
link
- M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi, Text-to-speech synthesis with arbitrary speaker's voice from average voice, Proc. of Eurospeech, pp.345-348, Sept. 2001.
link
- M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi, Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR, Proc of ICASSP, pp.805-808, May 2001.
link
- M. Tamura, T. Masuko, K. Tokuda, T. Kobayashi, Speaker adaptation for HMM-based speech synthesis system using MLLR, Proc. ESCA/COCOSDA Workshop on Speech Synthesis, pp.273-276, Nov. 1998.
link
- T. Masuko, K. Tokuda, T. Kobayashi, S. Imai, Voice characteristics conversion for HMM-based speech synthesis system, Proc. of ICASSP, pp.1611-1614, Apr. 1997.
link
- T. Masuko, K. Tokuda, T. Kobayashi, S. Imai, HMM-based speech synthesis with various voice characteristics, Proc. of ASA and ASJ 3rd Joint Meeting, pp.1043-1046, Dec. 1996.
- J. Yamagishi, T. Masuko, T. Kobayashi, HMM-based expressive speech synthesis -- Towards TTS with arbitrary speaking styles and emotions, Proc. of Special Workshop in Maui (SWIM), Jan. 2004.
pdf
- J. Yamagishi, Average-Voice-Based Speech Synthesis,
Ph.D thesis, Tokyo Institute of Technology, March 2006.
link
- L. Qin, Y.-J. Wu, Z.-H. Ling, R.-H. Wang, Improving the performance of HMM-based voice conversion using context clustering decision tree and appropriate regression matrix, Proc. of Interspeech, pp.2250-2253, Sept. 2006.
link
samples
- J. Yamagishi, T. Kobayashi, S. Renals, S. King, H. Zen, T. Toda, K. Tokuda, Improved Average-Voice-based Speech Synthesis using Gender-Mixed Modeling and A Parameter Generation Algorithm considering GV, Proc. ISCA SSW6, Aug. 2007.
Speaker interpolation†
- T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, T. Kitamura, Speaker interpolation in HMM-based speech synthesis system, Proc. of Eurospeech, pp.2523-2526, Sept. 1997.
link
- T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, T. Kitamura, Speaker interpolation for HMM-based speech synthesis system, J. Acoust. Soc. Jpn. (E), vol.21, no.4, pp.199-206, 2000.
link
Expressive speech synthesis†
- J. Yamagishi, K. Onishi, T. Masuko, T. Kobayashi, Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis, IEICE Trans. on Inf. & Syst., vol.E88-D, no.3, pp.503-509, March 2005.
link
demos
- M. Tachibana, J. Yamagishi, T. Masuko, T. Kobayashi, Speech synthesis with various emotional expressions and speaking styles by style Interpolation and morphing, IEICE Trans. Inf. & Syst., vol.E88-D, no.11, pp.2484-2491, Nov. 2005.
link
demos
- M. Tachibana, J. Yamagishi, T. Masuko, T. Kobayashi, A style adaptation technique for speech synthesis using HSMM and suprasegmental features, IEICE Trans. on Inf. & Syst., vol.E89-D, no.3, pp.1092-1099, March. 2006.
link
demos
- T. Nose, J. Yamagishi, T. Masuko, T. Kobayashi, A style control technique for HMM-based expressive speech synthesis, IEICE Trans. Inf. & Syst., vol.E90-D, no.9, pp.1406-1413, Sept. 2007.
link
demos
- T. Nose, M. Tachibana, T. Kobayashi, HMM-based style control for expressive speech synthesis with arbitrary speaker's voice using model adaptation, IEICE Trans. Inf. & Syst., vol.E92-D, 3, pp.489-497, Mar. 2009.
link
demos
Eigenvoices†
- K. Shichiri, A. Sawabe, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Eigenvoices for HMM-based speech synthesis, Proc. of ICSLP, pp.1269-1272, Sept. 2002.
demo
Excitation model†
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, T. Kitamura, Mixed excitation for HMM-based speech Synthesis, Proc. of Eurospeech, pp.2259-2262, Sept. 2001.
- S.-J. Kim, M.-S. Hahn, Two-band excitation for HMM-based speech synthesis, IEICE Trans. Inf. & Syst., vol.E90-D, no.1, pp.378-381, Jan. 2007.
link
- C. Hemptinne, Integration of the harmonic plus noise model (HNM) into the hidden Markov model-based speech synthesis system (HTS), Master thesis, IDIAP Research Institute, June 2006.
link
- R. Maia, T. Toda, H. Zen, Y. Nankaku, K. Tokuda, An excitation model for HMM-based speech synthesis based on residual modeling, Proc. ISCA SSW6, Aug. 2007.
samples
- X. Gonzalvo, J.C. Socoro, I. Iriondo, C.Monzo, E. Martinez, Linguistic and mixed excitation improvements on a HMM-based speech synthesis for Castilian Spanish, Proc. ISCA SSW6, Aug. 2007.
link
Blizzard Challenge†
- H. Zen, T. Toda, M. Nakamura, K. Tokuda, Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005, IEICE Trans. Inf. & Syst. vol.E90-D, No.1, pp.325-333, Jan. 2007.
link
- H. Zen, T. Toda, K. Tokuda, The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006, Proc. of Blizzard Challenge 2006 workshop, Sept. 2006.
link
- H. Zen, T. Toda, An overview of Nitech HMM-based speech synthesis system for Blizzard Challenge 2005, Proc. of Interspeech2005 (Eurospeech), pp.93-96, Sept. 2005.
link
- J. Yamagishi, H. Zen, T. Toda, K. Tokuda, Speaker-Independent HMM-based Speech Synthesis System - HTS-2007 System for the Blizzard Challenge 2007, Proc. of Blizzard Challenge 2007 workshop, Aug. 2007.
link
- Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, R.-H. Wang, USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method, Proc. of Blizzard Challenge 2006 workshop, Sept. 2006.
link
samples
- Z.-H. Ling, L. Qin, H. Lu, Y. Gao, L.-R. Dai, R.-H. Wang, Y. Jiang, Z.-W. Zhao, J.-H. Yang, J. Chen, G-P. Hu, The USTC and iFlytek Speech Synthesis Systems for Blizzard Challenge 2007, Proc. of Blizzard Challenge 2007 workshop, Aug. 2007.
link
- J. Yamagishi, T. Nose, H. Zen, T. Toda, K. Tokuda, Performance Evaluation of The Speaker-Independent HMM-based Speech Synthesis System "HTS-2007" for the Blizzard Challenge 2007, Proc. of ICASSP, Apr. 2008.
paper
lecture
Practical implementation†
- S.-J. Kim, J.-J. Kim, M.-S. Hahn, HMM-based Korean speech synthesis system for hand-held devices, IEEE Trans. Consumer Electronics, vol.52, no.4, pp.1384-1390, Nov. 2006.
link
Multilingual†
- R. Maia, H. Zen, K. Tokuda, T. Kitamura, F.G. Resende Jr., Towards the development of a Brazilian Portuguese text-to-speech system based on HMM, Proc. of Eurospeech, pp.2465-2468, Sept. 2003.
- O. Abdel-Hamid, S. Abdou, M. Rashwan, Improving Arabic HMM based speech synthesis quality, Proc. of Interspeech, pp.1332-1335, 2006.
- Y. Qian, F. Soong, Y. Chen, M. Chu, An HMM-based Mandarin Chinese text-to-speech system, Proc. of ISCSLP, Dec. 2006.
- S.-J. Kim, J.-J. Kim, M.-S. Hahn, Implementation and evaluation of an HMM-based Korean speech synthesis system, IEICE Trans. Inf. & Syst., vol. E89-D, no.3, pp.1116-1119, 2006.
link
- C. Weiss, R. Maia, K. Tokuda, W. Hess, Low resource HMM-based speech synthesis applied to German, Proc. of ESSP, 2005.
- C. Plahl, Sprachsynthese mit Hidden Markov Modellen, Master thesis, Bielefeld University, 2005 (in German).
link
- M. Barros, R. Maia, K. Tokuda, D. Freitas, F.G. Resende Jr., HMM-based European Portuguese speech synthesis, Proc. of Interspeech, pp.2581-2584, 2005.
- A. Lundgren, An HMM-based text-to-speech system applied to Swedish, Master thesis, Royal Institute of Technology (KTH), 2005.
link
- T. Ojala, Auditory quality evaluation of present Finnish text-to-speech systems, Master thesis, Helsinki University of Technology, 2006.
link
- M. Vainio, A. Suni, P. Sirjola, Developing a Finnish concept-to-speech system, Proc. of 2nd Baltic conference on Human Language Technologies, pp.201-206, 2005.
link
- B. Vesnicer, F. Mihelic, Evaluation of the Slovenian HMM-based speech synthesis system, Proc. of TSD, pp.513-520, 2004.
link
- M. Homayounpour, S. Mehdi, Farsi speech synthesis using hidden Markov model and decision trees, The CSI Journal on Computer Science and Engineering, vol.2, no.1&3(a), 2004 (in Farsi).
link
- J. Latorre, K. Iwano, S. Furui, New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer, Speech Communication, vol.48, no.10, pp.1227-1242 Oct. 2006
link.
- J. Latorre, K. Iwano, S. Furui, Polyglot synthesis using a mixture of monolingual corpora, Proc. of ICASSP, pp.1-4, 2005.
- S. Martincic-Ipsic, I. Ipsic, Croatian HMM-based speech synthesis, Journal of Computing and Information Technology, vol.14, no.4, pp.307-313, 2006.
link
- X. Gonzalvo, I. Iriondo, J. Socoró, F. Alías, C. Monzo, HMM-based Spanish speech synthesis using CBR as F0 estimator, ISCA Tutorial and Research Workshop on Non Linear Speech Processing (NOLISP07), 2007.
link
- S. Chomphan, T. Kobayashi, Implementation and Evaluation of an HMM-based Thai Speech Synthesis System, Proc. of Interspeech, 2007.
samples
- S. Krstulovic, A. Hunecke, M. Schroeder, An HMM-Based Speech Synthesis System applied to German and its Adaptation to a Limited Set of Expressive Football Announcements, Proc. of Interspeech, 2007.
link
Singing voice synthesis†
- K. Saino, H. Zen, Y. Nankaku, A. Lee, K. Tokuda, HMM-based singing voice synthesis system, Proc. Interspeech, pp.1141-1144, Sept. 2006.
samples
Application of HTS†
Hybrid approaches†
- H. Kawai, T. Toda, J. Yamagishi, T. Hirai, J. Ni, N. Nishizawa, M. Tsuzaki, K. Tokuda, XIMERA: a concatenative speech synthesis system with large scale corpora, IEICE Trans. J89-D-II, no.12, pp.2688-2698, Dec. 2006
link
demo (in Japanese)
link
- T. Hirai, J. Yamagishi, S. Tenpaku, Utilization of an HMM-Based Feature Generation Module in 5 ms Segment Concatenative Speech Synthesis, Proc. ISCA SSW6, Aug. 2007.
Motion synthesis†
- K. Mori, Y. Nankaku, C. Miyajima, K. Tokuda, and T. Kitamura, Motion generation for Japanese finger language based on hidden Markov models, Proc. FIT, pp.569–570, 2005.
- N. Niwase, J. Yamagishi, T. Kobayashi, Human Walking Motion Synthesis with Desired Pace and Stride Length Based on HSMM, IEICE Trans. Inf. & Syst. vol.E88-D, No.11, pp.2492-2499, Nov. 2005.
link
sample
- G. Hofer, H. Shimodaira, J. Yamagishi, Speech driven Head Motion Synthesis based on a Trajectory Model, Proc. SIGGRAPH2007 Poster, 2007.
sample
- O. Govokhina, G. Bailly, G. Breton, and P. Bagshaw, TDA: a new trainable trajectory formation system for facial animation, Proc. Interspeech, pp.1274–1247, 2006.
Handwriting recognition†
- L. Ma, Y.J. Wu, P. Liu, and F. Soong, A MSD-HMM approach to pen trajectory modeling for online handwriting recognition, Proc. ICDAR, pp.128–132, 2007.
Audio-visual†
- M. Tamura, S. Kondo, T. Masuko, and T. Kobayashi, Text-to-audiovisual speech synthesis based on parameter generation from HMM, Proc. Eurospeech, pp.959–962, 1999.
- S. Sako, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, HMM-based text-to-audio-visual speech synthesis, Proc. ICSLP, pp.25–28, 2000.
- T. Ishikawa, Y. Sawada, H. Zen, Y. Nankaku, C. Miyajima, K. Tokuda, and T. Kitamura, Audio-visual large vocabulary continuous speech recognition based on early integration, Proc. FIT, pp.203–204, 2002.
ASR†
- R. Terashima, T. Yoshimura, T. Wakita, K. Tokuda, and T. Kitamura, An evaluation method of ASR performance by HMM-based speech synthesis, Proc. Spring Meeting of ASJ, pp.159–160, 2003.
- K. Emoto, H. Zen, K. Tokuda, and T. Kitamura, Accent type recognition for automatic prosodic labeling, Proc. Autumn Meeting of ASJ, pp.225–226, 2003.
- H.L. Wang, Y. Qian, F. Soong, J.L. Zhou, and J.Q. Han, A multi-space distribution (MSD) approach to speech recognition of tonal languages, Proc. of Interspeech, pp.125–128, 2006.
- L. Zhang, C. Huang, M. Chu, F. Soong, X. Zhang, and Y. Chen, Automatic detection of tone mispronunciation in Mandarin, Proc. ISCSLP, pp.590–601, 2006.
- K. Tanaka, S. Kuroiwa, S. Tsuge, and F. Ren, An acoustic model adaptation using HMM-based speech synthesis, Proc. NLPKE, pp.368–373, 2003.
- M. Ishihara, C. Miyajima, N. Kitaoka, K. Itou, and K. Takeda, An approach for training acoustic models based on the vocabulary of the target speech recognition task, Proc. Spring Meeting of ASJ, pp.153–154, 2007.
Feature mapping†
- K. Richmond, A trajectory mixture density network for the acoustic-articulatory inversion mapping, Proc. of Interspeech, pp.577–580, 2006.
Speech coding†
- T. Hoshiya, S. Sako, H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Improving the performance of HMM-based very low bitrate speech coding, Proc. ICASSP, pp.800–803, 2003.