Clustering XML documents by structure has been, generally, accomplished by looking at the occurrence of one pre-established type of structural component in the structures of the XML documents. It is likely that focusing only on one type of structural component may produce clusters with a certain extent of inner structural inhomogeneity, because of uncaught differences in the structures of the XML documents or for an inappropriate choice of structural component. To overcome these limitations, a new parameter-free approach to clustering XML document is proposed, that allows to consider simultaneously multiple types of structural components to isolate structurally-homogeneous clusters of XML documents. The idea behind the approach is to represent each XML document as a transaction of boolean feature, enlightening of suitable selection of its structural components. A parameter-free clustering scheme is, then, used to isolate structural homogeneous clusters. A comparative evaluation over both real and synthetic XML data provides evidence of effectiveness and efficacy of the devised approach.
展开▼