I worked on a mandarin TTS system using HTS several years ago, it was based on MBE codec, multiband- excitation. The produced voices sounded a little smear, because the lsp order was too low(10?). Finding accurate f0 f1 f2... is very difficult, I do believe MBE is a better choice. Hi, Currently HTS employs fundamental frequency (F0) as an excitation parameter.
I wonder if has there been a study or publication on also using higher level frequencies (F1, F2, ...) in order to model voiced excitation more effectively. Thanks in advance.
<image001.jpg>
| Fatih Kıralioğlu
|
|
|
|
İTÜ Ayazağa Kampüsü Koru Yolu Arı-2 Teknokent Binası A Blok No:A4-4 34469 Maslak-İstanbul
| <image002.jpg>
| |
E-posta : fatih.kiralioglu@xxxxxxxxxxxxx Tel : +90 (212) 286 25 45 / 164 Faks : +90 (212) 286 25 47
|
|
|