An expressive dissimilarity measure for relational clustering using neighbourhood trees

Dumancic Sebastijan; Blockeel Hendrik

首页> 外文期刊>Machine Learning >An expressive dissimilarity measure for relational clustering using neighbourhood trees

【24h】

An expressive dissimilarity measure for relational clustering using neighbourhood trees

机译：使用邻域树的关系聚类的表达差异度量

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is an underspecified task: there are no universal criteria for what makes a good clustering. This is especially true for relational data, where similarity can be based on the features of individuals, the relationships between them, or a mix of both. Existing methods for relational clustering have strong and often implicit biases in this respect. In this paper, we introduce a novel dissimilarity measure for relational data. It is the first approach to incorporate a wide variety of types of similarity, including similarity of attributes, similarity of relational context, and proximity in a hypergraph. We experimentally evaluate the proposed dissimilarity measure on both clustering and classification tasks using data sets of very different types. Considering the quality of the obtained clustering, the experiments demonstrate that (a) using this dissimilarity in standard clustering methods consistently gives good results, whereas other measures work well only on data sets that match their bias; and (b) on most data sets, the novel dissimilarity outperforms even the best among the existing ones. On the classification tasks, the proposed method outperforms the competitors on the majority of data sets, often by a large margin. Moreover, we show that learning the appropriate bias in an unsupervised way is a very challenging task, and that the existing methods offer a marginal gain compared to the proposed similarity method, and can even hurt performance. Finally, we show that the asymptotic complexity of the proposed dissimilarity measure is similar to the existing state-of-the-art approaches. The results confirm that the proposed dissimilarity measure is indeed versatile enough to capture relevant information, regardless of whether that comes from the attributes of vertices, their proximity, or connectedness of vertices, even without parameter tuning.

机译：集群是一项未明确说明的任务：关于什么才是好的集群没有统一的标准。对于关系数据而言尤其如此，其中相似性可以基于个人的特征，他们之间的关系或两者的结合。在这方面，现有的用于关系聚类的方法具有强烈且通常隐含的偏见。在本文中，我们介绍了一种新的关系数据差异度量。这是结合多种类型的相似性的第一种方法，包括属性的相似性，关系上下文的相似性和超图中的接近性。我们使用非常不同类型的数据集对聚类和分类任务进行实验性评估，以提出建议的相异性度量。考虑到所获得的聚类的质量，实验证明：（a）在标准聚类方法中使用这种差异始终如一地提供良好的结果，而其他方法仅在匹配其偏差的数据集上有效；（b）在大多数数据集上，新颖的差异性甚至优于现有数据集。在分类任务上，所提出的方法在大多数数据集上通常比竞争对手在很大程度上胜过竞争对手。此外，我们表明，以无监督的方式学习适当的偏差是一项非常具有挑战性的任务，并且与所提出的相似性方法相比，现有方法提供的边际收益甚至会损害性能。最后，我们表明，所提出的相异性度量的渐近复杂度与现有的最新方法相似。结果证实，所提出的相异性度量确实具有足够的通用性，可以捕获相关信息，而不管这些信息是否来自顶点的属性，顶点的邻近性或顶点的连通性，即使没有参数调整也是如此。

著录项

来源
《Machine Learning》 |2017年第10期|1523-1545|共23页
作者
Dumancic Sebastijan; Blockeel Hendrik;
展开▼
作者单位

Katholieke Univ Leuven, Dept Comp Sci, Celestijnenlaan 200A, Heverlee, Belgium;

Katholieke Univ Leuven, Dept Comp Sci, Celestijnenlaan 200A, Heverlee, Belgium;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Relational learning; Clustering; Similarity of structured objects;

机译：关系学习;聚类;结构化对象的相似性;

相似文献

外文文献
中文文献
专利

1. A note on 'Similarity and dissimilarity measures between fuzzy sets: A formal relational study' and 'Additive similarity and dissimilarity measures' [J] . Couso Ines, Sanchez Luciano Fuzzy sets and systems . 2020,第Jul1期

机译：关于“模糊集之间”相似性和异化措施的备注：正式关系研究“与”添加相似性和不相似措施“
2. A fuzzy relational clustering algorithm based on a dissimilarity measure extracted from data [J] . Corsini P., Lazzerini B., Marcelloni F. IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics . 2004,第1期

机译：基于从数据中提取的不相似度量的模糊关系聚类算法
3. Multivalued type dissimilarity measure and concept of mutual dissimilarity value for clustering symbolic patterns [J] . Guru DS, Kiranagi BB Pattern Recognition: The Journal of the Pattern Recognition Society . 2005,第1期

机译：聚类符号模式的多值类型差异度量和互不相似值的概念
4. An Expressive Dissimilarity Measure for Relational Clustering Using Neighbourhood Trees [C] . Sebastijan Dumancic, Hendrik Blockeel European conference on machine learning and principles and practice of knowledge discovery in databases . 2017

机译：使用邻域树的关系聚类的表达差异度量
5. A Relational Framework for Clustering and Cluster Validity and the Generalization of the Silhouette Measure. [D] . Rawashdeh, Mohammad Y. 2013

机译：聚类和聚类有效性的关系框架以及轮廓测度的推广。
6. A multi-labeled tree dissimilarity measure for comparing clonal trees of tumor progression [O] . Nikolai Karpov, Salem Malikic, Md. Khaledur Rahman, 2019

机译：用于比较肿瘤进展的克隆树的多标签树差异度量
7. An expressive dissimilarity measure for relational clustering using neighbourhood trees [O] . Dumancic Sebastijan, Blockeel Hendrik 2017

机译：使用邻域树的关系聚类的表达差异度量

An expressive dissimilarity measure for relational clustering using neighbourhood trees

摘要

著录项

相似文献

相关主题

期刊订阅