首页> 外文会议>Discovery science >Graph Clustering Based on Optimization of a Macroscopic Structure of Clusters
【24h】

Graph Clustering Based on Optimization of a Macroscopic Structure of Clusters

机译:基于聚类宏观结构优化的图聚类

获取原文
获取原文并翻译 | 示例

摘要

A graph is a flexible data structure for various data, such as the Web, SNSs and molecular architectures. Not only the data expressed naturally by a graph, it is also used for data which does not have explicit graph structures by extracting implicit relationships hidden in data, e.g. co-occurrence relationships of words in text and similarity relationships of pixels of an image. By the extraction, we can make full use of many sophisticated methods for graphs to solve a wide range of problems. In analysis of graphs, the graph clustering problem is one of the most important problems, which is to divide all vertices of a given graph into some groups called clusters. Existing algorithms for the problem typically assume that the number of intra-cluster edges is large while the number of inter-cluster edges is absolutely small. Therefore these algorithms fail to do clustering in case of noisy graphs, and the extraction of implicit relationships tends to yield noisy ones because it is subject to a deflnition of a relation among vertices. Instead of such an assumption, we introduce a macroscopic structure (MS), which is a graph of clusters and roughly describes a structure of a given graph. This paper presents a graph clustering algorithm which, given a graph and the number of clusters, tries to find a set of clusters such that the distance between an MS induced from calculated clusters and the ideal MS for the given number of clusters is minimized. In other words, it solves the clustering problem as an optimization problem. For the m-clustering problem, the ideal MS is denned as an m-vertex graph such that each vertex has only a self-loop. To confirm the performance improvements exhaustively, we conducted experiments with artiGcial graphs with different amounts of noise. The results show that our method can handle very noisy graphs correctly while existing algorithms completely failed to do clustering. Furthermore, even for graphs with less noise, our algorithm treats them well if the difference between edge densities of intra-cluster edges and those of inter-cluster edges are sufficiently big. We also did experiments on graphs transformed from vector data as a more practical case. From the results we found that our algorithm, indeed, works much better on noisy graphs than the existing ones.
机译:图形是用于各种数据的灵活数据结构,例如Web,SNS和分子体系结构。通过提取隐藏在数据中的隐式关系,不仅图形自然表示的数据还用于没有显式图形结构的数据。文字中单词的共现关系和图像像素的相似关系。通过提取,我们可以充分利用许多复杂的图形方法来解决各种问题。在图的分析中,图聚类问题是最重要的问题之一,它是将给定图的所有顶点划分为称为聚类的某些组。用于该问题的现有算法通常假定集群内边缘的数量大而集群间边缘的数量绝对小。因此,这些算法在有噪图的情况下无法进行聚类,并且隐式关系的提取往往会产生有噪图,因为它受顶点之间关系的定义的影响。代替这种假设,我们引入了宏观结构(MS),它是簇的图,并粗略地描述了给定图的结构。本文提出了一种图聚类算法,该算法在给定图和聚类数量的情况下,试图找到一组聚类,以使计算聚类所诱导的MS与给定聚类数量的理想MS之间的距离最小。换句话说,它解决了聚类问题作为优化问题。对于m聚类问题,理想的MS被定义为m顶点图,以便每个顶点只有一个自环。为了详尽地确认性能改进,我们使用了具有不同噪声量的人工图形进行了实验。结果表明,我们的方法可以正确处理非常嘈杂的图,而现有算法完全无法进行聚类。此外,即使对于噪声较小的图,如果集群内边缘和集群间边缘的边缘密度之间的差异足够大,我们的算法也可以很好地对其进行处理。作为更实际的案例,我们还对从矢量数据转换而来的图形进行了实验。从结果中我们发现,实际上,我们的算法在噪声图上比现有的算法要好得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号