## Correction in State Duration Modeling for HMM-Based Speech Synthesis

In [1], [2],
we defined
,
the probability of staying at state
from time
to
given an observation sequence
of length
,
as

where
is the probability of being in state
at time
,
and we defined
.
Based on
,
the mean
and the variance
of the state duration density
of state
is obtained as

However,
the previous definition of
is statistially incorrect
because the state transitions were not taken into account.

We redefine
in a statistically correct manner as

where
denotes the state at time
,
denotes the parmeter set of the HMM,
and
denote
the forward and backward variables,
and
and
denote the state transition probability and the output probability,
respectively.

### References

- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura,
``Simultaneous modeling of spectrum, pitch and duration
in HMM-based speech synthesis,''
IEICE Trans. D-II, vol.J83-D-II, no.11, pp.2099--2107, Nov. 2000
(in Japanese).
- T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, and T. Kitamura,
``Duration modeling for HMM-based speech synthesis,''
Proc. ICSLP-98, vol.2, Tu3A4, pp.29--32, Nov. 1998.