[hts-users:00756] Problem on data preparation (building utterance file ".utt")
Hello to everyone working on HTS,
I'm trying to make Korean speech synthesis system using HTS(also using Festival, SPTK, etc.) with Korean database.
There are several data should be prepared for training (for example, ".lab", ".raw", ".utt", ".win" files) and Utterance file(.utt) is one of them.
I found I need six files to make Utterance file which are '.Segment', '.Syllable', '.Work', '.Phrase', '.IntEvent, '.Target'.
(You can verify this on 'Festvox Mailing LIsts' http://www.festvox.org/maillists.html)
The following are the brief descriptions and examples of them:
Segment
segment labels with (near) correct boundaries, in the phone set of your language.
File1.Segment
#
0.583812 121 pau
0.665375 121 D
0.738812 121 e
0.871687 121 f
0.918312 121 t
1.04994 121 E
1.09619 121 r
1.197 121 a
......
......
Syllable
Syllables, with stress marking (if appropriate) whose boundaries are closely aligned with the segment boundaries.
File1.Syllable
#
0.738812 121 D.e ; stress 0 ;
1.04994 121 f.t.E ; stress 1 ;
1.197 121 r.a ; stress 0 ;
.....
.....
Word
Words with boundaries aligned (close) to the syllables and segments. By words we mean the things which can be looked up in a lexicon thus "1986" would not be considered a word and should be rendered as three words "nineteen eighty six".
File1.Word
#
1.197 121 DeftEra ; wordlab "1"
.......
......
Phrase
A name and marking for the end of each prosodic phrase.
File1.Phrase
#
1.197 77 2
.....
......
IntEvent
Intonation labels aligned to a syllable (either within the syllable boundary or explicitly naming the syllable they should align to. If using ToBI (or some derivative) these would be standard ToBI labels, while in something like Tilt these would be "a" and "b" marking accents and labels.
File1.IntEvent
#
1.197 77 L*+H
1.77462 77 L*+H
2.24356 77 L*+H
....
....
Target
The mean F0 value in Hertz at the mid-point of each segment in the utterance.
The problem is I got only '.Segment' and '.Phrase' files in my Korean database.
So, here's my big question!
"When HTS uses utterance file during training, will it be possible(or OK) to generate speech if I just build utterance file with only those two files (.Segment and .Phrase)?"
If an answer is OK, I'll find the way building utterance file with only two factors even though "make_utts" script which can make utterance file located under festival/examples notifies there should be all six files prepared for making utterance file.
I'm expecting any comment about this from anyone.
Thank you in advance.
Best regards,
Chung
세계적 권위의 PC Magazine이 인정한 Windows Live Hotmail! 만나보세요!
- Follow-Ups
-
- [hts-users:00757] Re: Problem on data preparation (building utterance file ".utt"), Heiga ZEN (Byung Ha CHUN)