This is rarely mentioned but actually you MUST build Festival unit selection voice for your language from your data first. The reasons are simple:
1) Unit selection voice helps you to debug phonetic transcription and segmentation (very important for voice quality). With unit selection voices you can trace pronunciation issues directly to source units so you can figure out what is wrong in your training data annotation. With HTS voice you will never know what was wrong, just the quality will be a bit worse.
2) Once you have unit selection voice, feature dump is simple
The Festival documentation about building unit selection voices is available on http://festvox.org/bsv, it is not very complex.
16.07.2017, 10:15, "Atlas Khan" <atlaskhan90@xxxxxxxxx>:
I am working on Speech Synthesis for language which do not have any type of support in Festival. It has different phonemes and Lexicons than English. I have recordings in raw format. As per my knowledge, I need following types of data for speech synthesis with HTS. 
  1. questions
  2. labels (full and mono)
  3. utt
I want to ask how can I prepare questions, labels for language which have different lexicon and phonemes than English. If I need Festival for generating that data, than how can I do for language for which Festival do not have any support.
