Hello,
I'm trying to make a Catalan voice for Festival based on the HTS system and I see that the speech quality of the HTS model is far from perfect and, even though I played with some of the parameters of the model, I don't know what the problem can be since it works quite good in the arctic database.
Some information of the training database: 5500 seconds of speech, labeled using sphinxtrain (there is some missalignment ...), and a new phoneset defined for the catalan language. We are currently constructing a bigger and hand-labeled database and we use this one as a temporary solution.
I would also like to ask if the HTS module uses the duration, f0model, intonation and phrasing modules defined in the festvox directory to synthesize the voice.
If you need more information, do not hesitate to ask. Thank you very much!
Best,
Oriol