We propose a simple and intuitive clustering evaluation criterion based on the minimum description length principle which yields a particularly simple way of describing and encoding a set of examples. The basic idea is to view a clustering as a restriction of the attribute domains, given an example's cluster membership. As a special operational case we develop the so-called rectangular uniform message length measure that can be used to evaluate clusterings described as sets of typer-rectangles. We theoretically prove that this measure punishes cluster boundaries in regions of uniform instance distribution (i.e., unintuitive clusterings) and we experimentally compare a simple clustering algorithm using this measure with the well-known algorithms KMeans and AutoClass.
展开▼