首页> 外文OA文献 >Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
【2h】

Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

机译:大规模网络聚类算法和聚类质量指标分析

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Overview Notions of community quality underlie the clustering of networks. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms-Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. Cluster Quality Metrics We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 onmodularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on the information recovery metrics. Additionally, our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Network Clustering Algorithms Smart local moving is the overall best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it an absolutely superior algorithm. Interestingly, Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters.
机译:概述社区质量的概念是网络集群的基础。尽管围绕网络群集的研究越来越普遍,但对不同群集质量指标之间的关系的精确理解尚不清楚。在本文中,我们通过对四种广泛使用的网络聚类算法(Louvain,Infomap,标签传播和智能本地移动)进行严格分析,研究了独立集群质量指标与信息恢复指标之间的关系。我们考虑了模块化,传导性和覆盖率的独立质量指标,并考虑了先前工作中使用的调整后的兰德评分,标准化的互信息以及标准化的互信息的变体的信息恢复指标。我们的研究包括合成图和经验数据集,大小从1,000到1,000,000个节点不等。集群质量指标我们发现不同集群质量指标的结果之间存在显着差异。例如,聚类算法可以在模块化上返回0.4(满分为1),但在信息恢复上得分为1(满分为0)。我们发现电导率虽然不完善,但却是最能表明信息恢复指标性能的独立质量指标。此外,我们的研究表明,不能假定先前工作中使用的标准化互信息的变体与传统的标准化互信息仅略有不同。网络聚类算法智能本地移动是我们研究中总体上性能最好的算法,但是聚类评估指标之间的差异使我们无法宣布它是绝对优越的算法。有趣的是,在我们研究的几乎所有测试中,鲁汶都比Infomap表现更好,这与以前的结果(Infomap优于Louvain)相反。我们发现,尽管在簇定义不明确时标签传播效果不佳,但它可以有效且准确地缩放到具有清晰定义簇的大型图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号