[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:04308] Re: HTS questions files for Arabic


Dear Dina A-A,

 

   As far as I know, there is no tutorial or document specifically focusing on explaining how to design question file. Fortunately, I have experience of designing a “full label format” and its coupling question file for Chinese Mandarin, so I think my experience is also helpful for you.

 

   Firstly, you need to define a “full label format” for your Arabic language. A good example is “lab_format.pdf” released in HTS demo. The key points for designing valid label format have two steps: 1) defining the relevant and available phonetic and linguistic features that you can extract from your available text front end, 2) combining these features by a sequence of proper separators just similar to that in “lab_format.pdf”. It is suggested you mimic the format demonstrated in “lab_format.pdf” as similar as possible for a beginner, because some special characters are reserved in HTS toolkit, such as []\% #?/:{ },.’”. Unfortunately, there are no documents even in HTK book explicitly explaining these inherent reservation rules except you study them in the core source code of HTS tools.

   The full label format is something like this

   dPat sPat dPat sPat dPat sPat dPat sPat …

where dPat is data pattern, and sPat is separator pattern. The first design principle is about valid characters you can use for dPat and sPat from ASCII character set. dPat is best consisted of only upper and lower characters and digits, i.e., dPat is matched to regular _expression_ [a-zA-Z0-9], and it is not a good idea to distinguish two dPat only by their lower or upper case. sPat is best consisted of special characters as following, i.e., sPat is match to regular _expression_ [!#$&+-;=@^_|~], and sPat can also be section pattern such as /A:, /B:, /C:, …,  The second principle is that any dPat can be determined or “surrounded” by two consecutive sPat uniquely, as HHEd inherently use C function strstr() to match full label and question. Fail to obey this principle leads to ambiguity in decision tree based context clustering procedure, thus the quality of synthetic speech is degraded, but no errors is reported, as this is formally acceptable but not proper in semantics.

   [Reference] Released lab_format.pdf

   [Hint] you can directly use the sample label format for the first version in your Arabic language.

 

   Secondly, each question is typically about one dPat. Each dPat has many coupling questions which is language dependent. E.g., if one dPat is about the current phoneme, many questions may be applied to this dPat, such as: 1) Is current dPat vowel? 2) is current dPat voiced? 3) Is current dPat front vowel? If one dPat is about position in the current phrase, possible questions will be: 1) is current dPat <= 1? 2) is current dPat <= 2? 3) is current dPat <= 3? 4) is current dPat <= 4? 5) is current dPat == 1? 6) is current dPat == 2? 7) is current dPat == 3? 6) is current dPat == 4? … Format of each atomic question item is “sPat dPat sPat” except the first and last dPat – in this two ends the left or right sPat is missing, respectively. Carefully studying the release question file and the following thesis will inspire you how to implement it with good results.

   Question file consists of thousands of questions, each question consists of one or more atomic question items.

   [Reference] 1. Released sample question file in HTS-demo.

   2. Odell, J. J., “The Use of Context in Large Vocabulary Speech Recognition,” PhD thesis, Cambridge University, 1995.

 

   Designing a full label format and its coupling question file is not difficult but a little tedious.

 

   Good luck!

 

Yang Wang

 

--------------------------------------------------------------------------

Yang WANG, PhD

National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences

45th cubicle, 7th Floor, Automation Building, 95# Zhongguancun East Road, Haidian District, Beijing, China

Email: yangwang@xxxxxxxxxxxxx

 

######################################################

BTW [Personal Advertisement of Applying for a Postdoc Researcher Position in Speech Synthesis/Recognition or Related Topics]

Dear reader,

 

I obtained my PhD degree on 2015.7 from Institute of Automation of Chinese Academy of Sciences, with strong background in HMM based speech synthesis. I am also eager for other research topics related to speech processing.

 

My homepage is http://www.escience.cn/people/yangwang

 

My covering letter, CV, detailed research interests, and a research proposal on speech synthesis are available upon request.

 

If it is possible for you or your institution to recruit me, will you please send me an email including your homepage? It is also highly appreciated by me if you forward this email to other possibly interested reader.

 

Thanks a lot and best regards!

 

Sincerely yours,

 

Yang Wang

-----------------------------------------------

Yang WANG, PhD

National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences

45th cubicle, 7th Floor, Automation Building, 95# Zhongguancun East Road, Haidian District, Beijing, China

Email: yangwang@xxxxxxxxxxxxx


 



2015-08-27 6:42 GMT+08:00 Dina A-A <dina-a-a@xxxxxxxxxxx>:
 Hello ,
I am a researches who works on speech synthesis for Arabic language. One of the experiments I would like to do is to run HTS for Arabic language.
I have prepared everything and I need questions files for Arabic language .
if any one knows a resource or a reference or person who can provide these files or any guidance in how to build these files ,this would be great help for me and I will reference it in my research under his/her name.
Many thanks for your cooperation


References
[hts-users:04305] HTS questions files for Arabic, Dina A-A