...
【24h】

Non-Exhaustive, Overlapping Clustering

机译:非穷尽的重叠聚类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Traditional clustering algorithms, such as K-Means, output a clustering that is disjoint and exhaustive, i.e., every single data point is assigned to exactly one cluster. However, in many real-world datasets, clusters can overlap and there are often outliers that do not belong to any cluster. While this is a well-recognized problem, most existing algorithms address either overlap or outlier detection and do not tackle the problem in a unified way. In this paper, we propose an intuitive objective function, which we call the NEO-K-Means (Non-Exhaustive, Overlapping K-Means) objective, that captures the issues of overlap and non-exhaustiveness in a unified manner. Our objective function can be viewed as a reformulation of the traditional K-Means objective, with easy-to-understand parameters that capture the degrees of overlap and non-exhaustiveness. By considering an extension to weighted kernel K-Means, we show that we can also apply our NEO-K-Means idea to overlapping community detection, which is an important task in network analysis. To optimize the NEO-K-Means objective, we develop not only fast iterative algorithms but also more sophisticated algorithms using low-rank semidefinite programming techniques. Our experimental results show that the new objective and algorithms are effective in finding ground-truth clusterings that have varied overlap and non-exhaustiveness; for the case of graphs, we show that our method outperforms state-of-the-art overlapping community detection algorithms.
机译:传统的聚类算法(例如K-Means)输出的聚类是不相交且详尽的,即每个单独的数据点都被精确分配给一个聚类。但是,在许多现实世界的数据集中,聚类可以重叠,并且经常存在离群值不属于任何聚类。尽管这是一个公认的问题,但是大多数现有算法都解决了重叠检测或离群检测,并且不能以统一的方式解决该问题。在本文中,我们提出了一种直观的目标函数,我们将其称为NEO-K-Means(非穷举,重叠K均值)目标,该函数以统一的方式捕获重叠和非穷举性问题。我们的目标函数可以看作是对传统K均值目标的重新表述,它具有易于理解的参数,可以捕获重叠程度和非穷尽性。通过考虑对加权内核K-Means的扩展,我们表明我们还可以将NEO-K-Means思想应用于重叠社区检测,这是网络分析中的一项重要任务。为了优化NEO-K-Means目标,我们不仅开发了快速迭代算法,而且还开发了使用低秩半定性编程技术的更复杂算法。我们的实验结果表明,新的目标和算法可以有效地发现具有不同重叠度和非穷尽性的地面真相群集。对于图的情况,我们证明了我们的方法优于最新的重叠社区检测算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号