【24h】

Scalable Overlapping Community Detection

机译:可扩展的重叠群落检测

获取原文

摘要

Recent advancements in machine learning algorithms have transformed the data analytics domain and provided innovative solutions to inherently difficult problems. However, training models at scale over large data sets remains a daunting challenge. One such problem is the detection of overlapping communities within graphs. For example, a social network can be modeled as a graph where the vertices and edges represent individuals and their relationships. As opposed to the problem of graph partitioning or clustering, an individual can be part of multiple communities which significantly increases the problem complexity. In this paper, we present and evaluate an efficient parallel and distributed implementation of a Stochastic Gradient Markov Chain Monte Carlo algorithm that solves the overlapping community detection problem. We show that the algorithm can scale and process graphs consisting of billions of edges and tens of millions of vertices on a compute cluster of 65 nodes. To the best of our knowledge, this is the first time that the problem of deducing overlapping communities has been learned for problems of such a large scale.
机译:机器学习算法的最新进步已转换数据分析域,并为固有难题提供创新解决方案。然而,大型数据集规模的培训模型仍然是一个令人生畏的挑战。一个这样的问题是在图中检测重叠的社区。例如,社交网络可以被建模为顶点和边缘代表个人及其关系的图形。与图形分区或聚类的问题相反,个人可以是多个社区的一部分,这显着提高了问题复杂性。在本文中,我们展示并评估了一个有效的平行和分布式和分布式的STOPASTACT梯度马尔可夫链蒙特卡罗算法,解决了重叠的社区检测问题。我们表明该算法可以在65个节点的计算集群上刻度和处理由数十亿的边缘和数百万个顶点组成的图表。据我们所知,这是第一次筹备重叠社区的问题,已经了解了这种大规模的问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号