Single-label classification associates each instance with a single label, while multi-label classification (MLC), assigns multiple labels to instances. Simple MLC systems assume that labels are independent of one another, while more complex approaches capture inter-dependencies among labels. Experiments comparing performance of MLC systems demonstrate that there is much room for improvement. Notably, when an instance is associated with multiple labels, a feature-value of the instance may depend only on a subset of these labels and thus be conditionally independent of the others given the label-subset. Current systems do not account for such conditional independence. Moreover, dependence of a feature-value on a label is likely to imply its dependence on other inter-dependent labels. Our hypothesis is that by explicitly modeling the dependence between feature values and specific subsets of inter-dependent labels, the assignment of multi-labels to instances can be done more accurately. We present a probabilistic generative model that captures dependencies among labels as well as between features and labels, by means of a Bayesian network. We introduce the concept of label dependency sets as a basis for a new mixture model that represents conditional independencies between features and labels given subsets of inter-dependent labels. Experimental results show that the performance of the system we have developed based on our model for MLC significantly improves upon results obtained by current MLC systems that are based on probabilistic models.
展开▼