[hts-users:03965] Re: Decision trees Construction
- Subject: [hts-users:03965] Re: Decision trees Construction
- From: Keiichiro Oura <uratec@xxxxxxxxxxxx>
- Date: Wed, 11 Dec 2013 23:17:36 +0900
- Cc: uratec <uratec@xxxxxxxxxxxx>
- Delivered-to: hts-users@xxxxxxxxxxxxxxx
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=vh8sY59mdrlgf06JFSmjiuzeSvtIApeAhvZ/iY1bsYI=; b=lAJU56KEqrY4j5o+wdxGOIueQ3vb0aTLWAvy+IjI0dZ7w5HI9e6i4n8GNFU07uXjao PJKcQ0QOa3hmxiXdEZVfFwWbHKRpmHEng2On0rE+ddNRXZ3/h0wgtoBTqBc3TWmm9UMZ vxxmaPA75ssGLLem/Bgc1pxCDnlFNjCjXvyPYkcA4IohQvRU4MThyHauPtI4q+qZvree nF13/YQBAGvrcMuufpNu80RBlvEfHhjsCYmDiKYaFriq4Kgdd7eIR+XptnqnJHp3F/PQ B+0tDjL6t+CFM30tviHkPRgesAFq2Rd5nn9+jGsXbkoBrgr0dI3fEhASKgEZPmma+z9z SZNQ==
Hi,
ML-based context clustering is described in many papers.
For example, ...
J. J. Odell, “The Use of Context in Large Vocabulary Speech Recognition,”
PhD dissertation, Cambridge University, 1995.
K. Shinoda and T. Watanabe, “MDL-based context-dependent subword
modeling for speech recognition,” J. Acoust. Soc. Jpn.(E),
vol.21, no.2, pp.79–86, 2000.
Regards,
Keiichiro Oura
2013/12/10 Ibrahim Sobh <im_sobh@xxxxxxxxxxx>:
> Hi
>
> Regarding clustering (tree for MGC feature vector for example):
>
> 1-We start with a pool of all MGC vectors from all states/streams.
> 2-Each MGC is represented as single Gaussian, diagonal covariance.
> 3-Then, we apply "context questions" to spilt this pool into 2 sub-pools,
> each with collection of Gaussians.
> 4-We select “the best question” and apply it.
> 5-We repeat splitting until some stop criteria or threshold.
>
>
> **My question is:
> How exactly we select the “best question”? It is mentioned, in the HTK book
> and Odell work, that the best question is the one that “maximize the
> increase of log likelihood of the training data”. I do not understand this
> line! How this is calculated (mathematical formula)? P(sub-pool|pool)?!! or
> some sort of distance between Gaussian belonging to the same pool?
>
>
> **Note:
> In case of “normal” decision tree constructing; I understand how we use
> entropy and information gain measures every time we apply a question and
> then decide “the best question” according to the largest Information Gain.
> Example: Decision Tree Leaning ID3 algorithm.
>
> Regards
- References
-
- [hts-users:03961] HTS With STRAIGHT, Karthik Krishnan
- [hts-users:03962] Re: HTS With STRAIGHT, Keiichiro Oura
- [hts-users:03963] Decision trees Construction, Ibrahim Sobh