[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:00756] Problem on data preparation (building utterance file ".utt")


 
Hello to everyone working on HTS,
 
 I'm trying to make Korean speech synthesis system using HTS(also using Festival, SPTK, etc.) with Korean database.
 There are several data should be prepared for training (for example, ".lab", ".raw", ".utt", ".win" files) and Utterance file(.utt) is one of them.
 I found I need six files to make Utterance file which are '.Segment', '.Syllable', '.Work', '.Phrase', '.IntEvent, '.Target'.
 (You can verify this on 'Festvox Mailing LIsts' http://www.festvox.org/maillists.html)
 
 The following are the brief descriptions and examples of them:
 
  Segment
      
      segment labels with (near) correct boundaries, in the phone set of your language.
 
       File1.Segment
       #
             0.583812           121 pau
             0.665375           121 D
             0.738812           121 e
             0.871687           121 f
             0.918312           121 t
             1.04994             121 E
             1.09619             121 r
             1.197                121 a
            ......
            ......

 Syllable
 
      Syllables, with stress marking (if appropriate) whose boundaries are closely aligned with the segment boundaries.
 
      File1.Syllable
      #
            0.738812 121 D.e ; stress 0 ;
            1.04994 121 f.t.E ; stress 1 ;
            1.197 121 r.a ; stress 0 ;
            .....
            .....

 Word
 
      Words with boundaries aligned (close) to the syllables and segments. By words we mean the things which can be looked up in a lexicon thus "1986" would not be considered a word and should be rendered as three words "nineteen eighty six".
 
      File1.Word
      #
             1.197        121 DeftEra ; wordlab "1" 
             .......
             ......

 Phrase
      A name and marking for the end of each prosodic phrase.
 
      File1.Phrase
      #
            1.197        77 2
            .....
            ......
 
 IntEvent
 
      Intonation labels aligned to a syllable (either within the syllable boundary or explicitly naming the syllable they should align to. If using ToBI (or some derivative) these would be standard ToBI labels, while in something like Tilt these would be "a" and "b" marking accents and labels.
 
      File1.IntEvent
      #
             1.197    77 L*+H
             1.77462             77 L*+H
             2.24356             77 L*+H
            ....
            ....

  Target
 
       The mean F0 value in Hertz at the mid-point of each segment in the utterance.
 
 
 
 
 
  The problem is I got only '.Segment' and '.Phrase' files in my Korean database.
 
  So, here's my big question!
 
  "When HTS uses utterance file during training, will it be possible(or OK) to generate speech if I just build utterance file with only those two files (.Segment and .Phrase)?"
 
   If an answer is OK, I'll find the way building utterance file with only two factors even though "make_utts" script which can make utterance file located under festival/examples notifies there should be all six files prepared for making utterance file.
 
  
   I'm expecting any comment about this from anyone.
 
   Thank you in advance.
 
 
 
 Best regards,
 
 Chung



세계적 권위의 PC Magazine이 인정한 Windows Live Hotmail! 만나보세요!
Follow-Ups
[hts-users:00757] Re: Problem on data preparation (building utterance file ".utt"), Heiga ZEN (Byung Ha CHUN)