On Oct. 25 and 26 (Thursday/Friday), there will be a small discussion workshop on expressive TTS using HMM-based speech synthesis techniques, which will be held in Shanghai Jiao Tong University, Shanghai, China. The programme consists of:
- Oct. 25 : One-day tutorial on HM-based synthesis as well as techniques for expressive TTS. (A more detailed intro. is listed below)
- Oct. 26: Discussion on open problems, possible solutions on expressive TTS using HMM-based techniques.
Title: Tutorial on HMM-based statistical speech synthesis
Tomoki Toda, Associate Professor, NARA Institute of Science and Technology, Japan
Junichi Yamagishi, Senior Research Fellow, Edinburgh University, U.K.
Oct. 25 (Thursday), Electronic and Information (SEIEE ) Building (电信群楼) 3#410, Shanghai Jiao Tong University
Morning: 9:30 - 12:00
- General concepts and framework of HMM-based synthesis (Tomoki Toda)
- F0 modelling in HMM-based synthesis (Kai Yu, local host)
Afternoon: 14:00 - 16:30
- Adaptation techniques for HMM based synthesis (Junichi Yamagishi)
- Expressive TTS: algorithms and trend (Junichi Yamagishi)
Traditionally, speech synthesis is to convert text to intelligible speech. Recently, naturally expressive speech attracts more and more research interest due to emerging real-world applications such as augmentative and alternative communication aids and entertainment based applications. This one-day tutorial will give a general overview of hidden Markov model (HMM)-based speech synthesis, which has recently been demonstrated to be very effective in synthesizing expressive speech. The main advantage of the approach is its flexibility in changing speaker identities, speaking styles, and emotions, which are key components for naturally expressive speech. Many techniques to control variation in speech have been proposed and this tutorial will review several major ones, including adaptation, interpolation, and multiple regression. This tutorial will also introduce one of the important challenges of speech synthesis development today, that is how to produce reactive expressive speech for different situations.
Tomoki Toda earned his B.E. degree from Nagoya University, Aichi, Japan, in 1999 and his M.E. and D.E. degrees from the Graduate School of Information Science, NAIST, Nara, Japan, in 2001 and 2003, respectively. He was a Research Fellow of JSPS in the Graduate School of Engineering, Nagoya Institute of Technology, Aichi, Japan, from 2003 to 2005. He was an Assistant Professor of the Graduate School of Information Science, NAIST from 2005 to 2011, where he is currently an Associate Professor. He has also been a Visting Researcher at the NICT, Kyoto, Japan, since May 2006. From March 2001 to March 2003, he was an Intern Researcher at the ATR Spoken Language Communication Research Laboratories, Kyoto, Japan, and then he was a Visiting Researcher at the ATR until March 2006. He was also a Visiting Researcher at the Language Technologies Institute, CMU, Pittsburgh, USA, from October 2003 to September 2004 and at the Department of Engineering, University of Cambridge, Cambridge, UK, from March to August 2008. His research interests include statistical approaches to speech processing such as speech synthesis and speech analysis. He published over 30 journal papers and 100 conference papers in the research area, and received 8 paper awards including the 2009 Young Author Best Paper Award from the IEEE SPS. He was a member of the Speech and Language Technical Committee of the IEEE SPS from 2007 to 2009.
Junichi Yamagishi is a senior research follow and holds an EPSRC Career Acceleration Fellowship in the Centre for Speech Technology Research (CSTR) at the University of Edinburgh. He was awarded a Ph.D. by Tokyo Institute of Technology in 2006 for a thesis that pioneered speaker-adaptive speech synthesis and was awarded the Tejima Prize as the best Ph.D. thesis of Tokyo Institute of Technology in 2007. Since 2006, he has been in CSTR and has authored and co-authored about 100 refereed papers in international journals and conferences. His work has led directly to three large-scale EC FP7 projects and two collaborations based around clinical applications of this technology. A recent coauthored paper was awarded IEEE Signal Processing Society Best Student Paper Award in 2010 and as cited as a “landmark achievement of speech synthesis.” He was awarded the Itakura Prize (Innovative Young Researchers Prize) from the Acoustic Society of Japan for his achievements in adaptive speech synthesis. He is an external member of the Euan MacDonald Centre for Motor Neurone Disease Research and the Anne Rowling Regenerative Neurology Clinic in Edinburgh.