Several information theory based measures have been used in machine learning. Using the definition of the Kullback-Leibler entropy, this paper presents a new measure for clustering objects - the attribute redundancy measure. First, an introduction to clustering is made, with its interpretation from the machine learning point of view and a classification of clutering techniques poitned out. Then, a description of the use of information theory based measures in machine learning, both in supervised and in unsupervised learning is made, including the application of the mutual information. Next, the new measure is presented, highlighting its ability to capture relations between attributes and outlining its closeness to other concepts of information theory. Finally, and a genetic algorithm as the search procedure to find the best clusteirng, a comparison between the attribute redundancy measure and the mutual information is made.
展开▼