【24h】

Correlation Clustering with Partial Information

机译:与部分信息的相关聚类

获取原文
获取外文期刊封面目录资料

摘要

We consider the following general correlation-clustering problem: given a graph with real edge weights (both positive and negative), partition the vertices into clusters to minimize the total absolute weight of cut positive edges and uncut negative edges. Thus, large positive weights (representing strong correlations between endpoints) encourage those endpoints to belong to a common cluster; large negative weights encourage the endpoints to belong to different clusters; and weights with small absolute value represent little information. In contrast to most clustering problems, correlation clustering specifies neither the desired number of clusters nor a distance threshold for clustering; both of these parameters are effectively chosen to be the best possible by the problem definition. Correlation clustering was introduced by Bansal, Blum, and Chawla, motivated by both document clustering and agnostic learning. They proved NP-hardness and gave constant-factor approximation algorithms for the special case in which the graph is complete (full information) and every edge has weight +1 or -1. We give an O(log n)-approximation algorithm for the general case based on a linear-programming rounding and the "region-growing" technique. We also prove that this linear program has a gap of Ω(log n), and therefore our approximation is tight under this approach. We also give an O(r~3)-approximation algorithm for K_(r,r)-minor-free graphs. On the other hand, we show that the problem is APX-hard, and any o(log n)-approximation would require improving the best approximation algorithms known for minimum multicut.
机译:我们考虑以下一般相关聚类问题:给定具有实际边缘权重的图表(正负),将顶点分隔成簇以最小化切割正边缘和未切割负边缘的总绝对重量。因此,大的积极权重(表示端点之间的强相关性)鼓励那些终点属于共同的群集;大负重鼓励终点属于不同的集群;和小绝对值的重量代表很少的信息。与大多数聚类问题相比,相关群集既不指定所需的簇数,也不是群集的距离阈值;通过问题定义有效地选择这些参数的两个参数。 Bansal,Blum和Chawla引入了相关聚类,由文档聚类和无话用学习引发。它们证明了NP硬度,并为特殊情况提供了恒因子近似算法,其中图形完成(完整信息),每个边缘都有重量+1或-1。我们基于线性编程舍入和“地区生长”技术,给出了一般案例的O(log n) - 千克估计算法。我们还证明,该线性程序具有Ω(log n)的间隙,因此在这种方法下我们的近似是紧的。我们还给出了k_(r,r)-minor图形的o(r〜3) - k_(r)的估计算法。另一方面,我们表明问题是APX-Hard,任何O(log n) - 千克估计都需要改进最小多型已知的最佳近似算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号