I have read the paper titled "A Speech Parameter Generation Algorithm Considering Global Variance for HMM-Based Speech Synthesis" and I am confused about the GV distribution. According to the paper, the GV distribution for each parameter is a single Gaussian, but in the HTS demo a decision tree is trained for this parameter.
I want to know, when a context dependent distribution is used for GV, what part of the algorithm will be changed.
I will be appreciated if there is a useful reference for it.
Thanks in advance.