首页> 外文会议>2011 IEEE 27th International Conference on Data Engineering >Mining large graphs: Algorithms, inference, and discoveries
【24h】

Mining large graphs: Algorithms, inference, and discoveries

机译:挖掘大图:算法,推理和发现

获取原文

摘要

How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such terabyte-scale graphs? In this work, we focus on inference, which often corresponds, intuitively, to “guilt by association” scenarios. For example, if a person is a drug-abuser, probably its friends are so, too; if a node in a social network is of male gender, his dates are probably females. We show how to do inference on such huge graphs through our proposed HAdoop Line graph Fixed Point (Ha-Lfp), an efficient parallel algorithm for sparse billion-scale graphs, using the Hadoop platform. Our contributions include (a) the design of Ha-Lfp, observing that it corresponds to a fixed point on a line graph induced from the original graph; (b) scalability analysis, showing that our algorithm scales up well with the number of edges, as well as with the number of machines; and (c) experimental results on two private, as well as two of the largest publicly available graphs — the Web Graphs from Yahoo! (6.6 billion edges and 0.24 Tera bytes), and the Twitter graph (3.7 billion edges and 0.13 Tera bytes). We evaluated our algorithm using M45, one of the top 50 fastest supercomputers in the world, and we report patterns and anomalies discovered by our algorithm, which would be invisible otherwise.
机译:我们如何在具有数十亿个不适合内存的节点和边的图上找到模式和异常?如何对此类TB级图使用并行性?在这项工作中,我们专注于推理,通常在直觉上对应于“因关联而内gui”的场景。例如,如果一个人是吸毒者,那么它的朋友也可能是如此。如果社交网络中的某个节点是男性,则其约会日期可能是女性。我们展示了如何通过我们建议的HAdoop线图固定点(Ha-Lfp),使用Hadoop平台,对稀疏的十亿级图进行有效的并行算法,来对如此巨大的图进行推理。我们的贡献包括:(a)Ha-Lfp的设计,观察它与从原始图形得出的折线图上的固定点相对应; (b)可伸缩性分析,表明我们的算法随着边缘数量以及机器数量的增加而很好地扩展; (c)在两个私人以及两个最大的公开可用图上的实验结果-Yahoo!的Web Graphs。 (66亿个边缘和0.24 Tera字节)和Twitter图(37亿个边缘和0.13 Tera字节)。我们使用M45(世界上最快的50台超级计算机之一)评估了我们的算法,并报告了我们的算法发现的模式和异常,否则这些模式和异常是看不见的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号