[hts-users:03381] Vowel duration modeling

Dear HTS users,
I want to develop models for long versions of vowels in my language. I have recorded different utterances containing both long and short vowels, labeled them as different phones, trained separate models for long and short versions of vowels.
I also made changes to text analyzer. When trained on a corpus of 40 utterances (all of them contain long vowels), it works normally.
But when I add this 40 utterances to 2500-uterrance corpus (which doesn't have long vowels), there's no distinction between long and short vowels in synthesised speech.
What can be problem in this case?
