【24h】

Scalable Analysis of Open Data Graphs

机译:开放数据图的可扩展分析

获取原文

摘要

We have studied Open Data as a connected graph. Each data package is considered a vertex, and we studied the similarity graph induced by several different similarity measures. We analyzed the resulting similarity graph using different metrics to estimate its quality and informativeness. In order to cope with the size of the open data graph (over 6 billion edges), the graph constructions and analysis are done using a distributed computation framework, Apache Spark. The algorithms were implemented using the Spark resilient distributed data algebra, and executed on the Google Cloud Platform (GCP).
机译:我们已经研究了开放数据作为连接图。每个数据包都被视为一个顶点,并且我们研究了由几种不同的相似性度量得出的相似性图。我们使用不同的指标分析了所得相似度图,以评估其质量和信息性。为了应付开放数据图的大小(超过60亿条边),使用分布式计算框架Apache Spark完成图的构建和分析。该算法使用Spark弹性分布式数据代数实现,并在Google Cloud Platform(GCP)上执行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号