[hts-users:03937] Re: MDL vs ML

Subject: [hts-users:03937] Re: MDL vs ML

From: Xavi Gonzalvo <xavi.gonzalvo@xxxxxxxxx>

Date: Tue, 19 Nov 2013 20:55:04 +0000

Delivered-to: hts-users@xxxxxxxxxxxxxxx

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=dJGFhrKXDl9xQLegrQQ3AhXphi/FKje6BJVrZObMbOE=; b=iskzPBuLrmzK1lR2jiX1TVI9aqaAE+klx/NCwZ3odLkgYOGiZwNE/Rc3BEd4SZLUl6 g0hYKrjP6Y3Dyb1mnQWfK1C3wCc3qlS9N4HI/lZBjFB28m//H3mvUthjkHZoZu/RAk8n h0yfJU2D42kvzvjsFOWCBYQSN6jNCOKRp7RIizey+xBD8Xqrxmm42AX0NASnWICCpFY0 d/VPZZchjoDNoP45+yvmup88w3vhr08k1135k/ld88OHHT8R8bwAw3eLC8qYUFbPrmKO hcWmR7gbwd0f3LBvxXNUkZk6qPmKJNnkdnFg+Pl3qYeC8g/yK5LsHIUVZItc/+TnBAax 6CzQ==

The MDL criterion is an effective way to select the optimal probabilistic model from among various models. When used for decision tree clustering in HMM-TTS the ML criterion stays but MDL penalises larger trees.

Suppose we are given a sequence of N data points x = {x1, . . . , xN }. As an estimation problem, we could say that we are looking for the model that has generated this data. In other words, we try to estimate a vector of parameters θ = [θ1, . . . , θL] of a statistical model Pθ(x) for the data x. The MDL criterion is an effective way to select the optimal probabilistic model from among various models. In order to do that, it selects the statistical model with the minimum description length for the given data. The description length Dj(x) for data x of an underlying probabilistic model j is given by,

where:
• θˆ(j) represents the ML estimate of model j for the vector of parameters θ.

• Lj is the number of parameters of θˆ(j) in probabilistic model j.

One of the advantages of the MDL criterion is that the second term defined in the equation works as a penalty imposed for employing a large model size. So, as a model becomes more complex, the value of the first term decreases and that of the second term increases.

See:

Rissanen, J. (1984). Universal coding, information, prediction, and estimation. IEEE Transactions on Information Theory, 30:629–636.

2013/11/19 Tóth Bálint <toth.b@xxxxxxxxxxx>

Dear All,

Is there a reason, why MDL (Minimum Description Length) is preferred oved ML (Maximum Likelihood) for building decision trees in HMM-TTS?

Thank you for your answer in advance!

Best Regards,
Balint Toth

--
Xavi.