## Correction in State Duration Modeling for HMM-Based Speech Synthesis

In [1], [2], we defined , the probability of staying at state from time to given an observation sequence of length , as

where is the probability of being in state at time , and we defined . Based on , the mean and the variance of the state duration density of state is obtained as

However, the previous definition of is statistially incorrect because the state transitions were not taken into account.

We redefine in a statistically correct manner as

where denotes the state at time , denotes the parmeter set of the HMM, and denote the forward and backward variables, and and denote the state transition probability and the output probability, respectively.

### References

1. T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,'' IEICE Trans. D-II, vol.J83-D-II, no.11, pp.2099--2107, Nov. 2000 (in Japanese).
2. T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, and T. Kitamura, Duration modeling for HMM-based speech synthesis,'' Proc. ICSLP-98, vol.2, Tu3A4, pp.29--32, Nov. 1998.