【24h】

Pairwise Data Clustering and Applications

机译:成对数据聚类和应用程序

获取原文
获取外文期刊封面目录资料

摘要

Data clustering is an important theoretical topic and a sharp tool for various applications. Its main objective is to partition a given data set into clusters such that the data within the same cluster are "more" similar to each other with respect to certain measures. In this paper, we study the pairwise data clustering problem with pairwise similarity/dissimilarity measures that need not satisfy the triangle inequality. By using a criterion, called the minimum normalized cut, we model the pairwise data clustering problem as a graph partition problem. The graph partition problem based on minimizing the normalized cut is known to be NP-hard. We present a ((4 + o(1)) ln n)-approximation polynomial time algorithm for the minimum normalized cut problem. We also give a more efficient algorithm for this problem by sacrificing the approximation ratio slightly. Further, our scheme achieves a ((2 + o(1)) ln n)-approximation polynomial time algorithm for computing the sparsest cuts in edge-weighted and vertex-weighted undirected graphs, improving the previously best known approximation ratio by a constant factor.
机译:数据集群是各种应用的重要理论主题和急剧的工具。其主要目标是将给定的数据分为集群,使得同一群集内的数据相对于某些措施相似地与彼此类似。在本文中,我们使用不需要满足三角形不等式的成对相似性/不同措施来研究成对数据聚类问题。通过使用称为最小归一化切割的标准,我们将成对数据聚类问题模拟为图形分区问题。已知基于最小化归一化切割的图分区问题是NP-HARD。我们呈现((4 + O(1))ln n) - 用于最小归一化切割问题的批量多项式时间算法。我们还通过稍微牺牲近似比来给出更有效的算法。此外,我们的方案实现了((2 + O(1))Ln N) - 用于计算边缘加权和顶点加权的无向图中的稀疏性切割的多项式时间算法,通过恒定因子提高先前最佳已知的近似比。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号