首页> 外文会议>International conference on management of data >Efficient External-Memory Bisimulation on DAGs
【24h】

Efficient External-Memory Bisimulation on DAGs

机译:高效的外部内存在DAG上的外部内存双刺激

获取原文

摘要

In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific workflows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in an XML data set is the first step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, however, only internal-memory bisimulatiori algorithms have been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, there is a practical need for an efficient approach to computing bisimulation in external memory. Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)), where |N| and |E| are the numbers of nodes and edges, resp., in the data graph and SORT(n) is the number of accesses to external memory needed to sort an input of size n. We also study specializations of this algorithm to common variations of bisimulation for tree-structured XML data sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms can process such graphs efficiently even when very limited internal memory is available. The proposed algorithms are simple enough for practical implementation and use, arid open the door for further study of external-memory bisimulation algorithms. To this end, the full open-source C++ implementation has been made freely available.
机译:在本文中,我们介绍了第一有效的外部存储器算法来计算定纤维图(DAG)的双模等效类。 DAG通常用于在各种实际应用中建模数据,从XML文档和数据出处模型到Web分类和科学工作流程。在高效推理的研究中,节点双模的概念起着核心作用。例如,在XML数据集中将BIMIMILAR节点分组在一起是许多复杂的方法的第一步,用于构建用于高效XPath查询评估的索引数据结构。然而,迄今为止,只研究了内部内存Bisimulatiori算法。随着现实世界DAG数据集的大小通常超过可用的主存储器,所需的外部存储器中的存储变为。因此,存在有效的方法来计算在外部存储器中的双刺激。我们的常规算法具有o的最坏情况的o(sort(| n | + | e |)),其中n |和| e |是节点和边缘的数量,RESP。,在数据图和排序(n)中是对外部存储器的访问数量来排序尺寸n所需的外部存储器。我们还研究了该算法的专业化,以实现树结构XML数据集的常见分析的常见变化。我们经验验证了图表和XML文档的有效性能,具有数十亿节点和边缘,并且发现算法即使在非常有限的内存可用时也可以有效地处理这些图。所提出的算法对于实际实施和使用而言,干旱地开门以进一步研究外部内存分配算法。为此,已自由使用完整的开源C ++实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号