[hts-users:04450] Training Data For Speech Synthesis

I need to prepare training data for building speech synthesis system for a new language. I saw demo for English language on HTS website which have training data for under data directory. This directory has different type of training data. I want know how can I build this type of training data for a new language. I need to build TTS system for Urdu Language.