[Subject Prev][Subject Next][Thread Prev][Thread Next][Date Index][Thread Index]

[hts-users:03965] Re: Decision trees Construction


Hi,

ML-based context clustering is described in many papers.
For example, ...

J. J. Odell, “The Use of Context in Large Vocabulary Speech Recognition,”
PhD dissertation, Cambridge University, 1995.

K. Shinoda and T. Watanabe, “MDL-based context-dependent subword
modeling for speech recognition,” J. Acoust. Soc. Jpn.(E),
vol.21, no.2, pp.79–86, 2000.

Regards,
Keiichiro Oura



2013/12/10 Ibrahim Sobh <im_sobh@xxxxxxxxxxx>:
> Hi
>
> Regarding clustering (tree for MGC feature vector for example):
>
> 1-We start with a pool of all MGC vectors from all states/streams.
> 2-Each MGC is represented as single Gaussian, diagonal covariance.
> 3-Then, we apply "context questions" to spilt this pool into 2 sub-pools,
> each with collection of Gaussians.
> 4-We select “the best question” and apply it.
> 5-We repeat splitting until some stop criteria or threshold.
>
>
> **My question is:
> How exactly we select the “best question”? It is mentioned, in the HTK book
> and Odell work, that the best question is the one that “maximize the
> increase of log likelihood of the training data”. I do not understand this
> line! How this is calculated (mathematical formula)? P(sub-pool|pool)?!! or
> some sort of distance between Gaussian belonging to the same pool?
>
>
> **Note:
> In case of “normal” decision tree constructing; I understand how we use
> entropy and information gain measures every time we apply a question and
> then decide “the best question” according to the largest Information Gain.
> Example: Decision Tree Leaning ID3 algorithm.
>
> Regards

References
[hts-users:03961] HTS With STRAIGHT, Karthik Krishnan
[hts-users:03962] Re: HTS With STRAIGHT, Keiichiro Oura
[hts-users:03963] Decision trees Construction, Ibrahim Sobh