首页> 外文学位 >Efficient data management and keyword-based association discovery on graph data of large scale.
【24h】

Efficient data management and keyword-based association discovery on graph data of large scale.

机译:大规模图形数据的高效数据管理和基于关键字的关联发现。

获取原文
获取原文并翻译 | 示例

摘要

Graph has been widely used in modeling problems in many domains as Bioinformatics, Cheminformatics and the Semantic Web. We target at how to efficiently store and query graph data and how to express and efficiently answer complex search queries.;The existing graph storage and query evaluation techniques mostly store graph data in relational tables and transform graph queries into SQL queries. The mismatch of the rigid relational model and the flexible graph model prevents these techniques from preserving the semantics of graph data, having high storage efficiency and high query efficiency at the same time. We propose to take advantage of the mature storage and query evaluation techniques in the context of semi-structured data and propose to decompose graph data into XML trees to be stored in XML repository. The graph query is transformed into XML queries and evaluated in XML repository. Our experimental results show that the RDF-to-XML decomposition can meet all three criteria. We studied search applications in Bioinformatics, Health informatics and Social Networks. We observed that finding paths satisfying constraints in a graph is critical to these search scenarios. We abstract such search requests and formally define the problem of constraint acyclic path (CAP) discovery. We study how to express CAP queries and propose a new graph query language, constraint SPARQL (cSPARQL), to fulfill the need in expressing CAP search queries, as well as more complex pattern matching search queries cooperating with CAP discovery. We propose efficient algorithms to answer CAP discovery problem: constraint DFS algorithms (cDFS and ecDFS) are based on DFS graph traversal with efficient pruning on search branches; localized Search & Join (S&J) uses the local information to limit the search ranges and perform more effective pruning. We implement the algorithms in a prototype system-Conkar that can be applied to multiple domains, e.g. drug discovery.
机译:在生物信息学,化学信息学和语义网等许多领域,图形已被广泛用于建模问题。我们的目标是如何有效地存储和查询图形数据,以及如何表达和有效地回答复杂的搜索查询。现有的图形存储和查询评估技术大多将图形数据存储在关系表中并将图形查询转换为SQL查询。刚性关系模型和柔性图模型的不匹配阻止了这些技术保留图数据的语义,同时具有高存储效率和高查询效率。我们建议在半结构化数据的上下文中利用成熟的存储和查询评估技术,并建议将图形数据分解为XML树以存储在XML存储库中。图形查询将转换为XML查询,并在XML存储库中进行评估。我们的实验结果表明,从RDF到XML的分解可以满足所有三个条件。我们研究了生物信息学,健康信息学和社交网络中的搜索应用程序。我们观察到,在图中找到满足约束条件的路径对于这些搜索方案至关重要。我们抽象化此类搜索请求,并正式定义约束非循环路径(CAP)发现问题。我们研究了如何表达CAP查询,并提出了一种新的图查询语言约束SPARQL(cSPARQL),以满足表达CAP搜索查询以及与CAP发现配合使用的更复杂的模式匹配搜索查询的需求。我们提出了一种有效的算法来解决CAP发现问题:约束DFS算法(cDFS和ecDFS)基于DFS图遍历,并在搜索分支上进行了有效修剪。本地化搜索与联接(S&J)使用本地信息来限制搜索范围并执行更有效的修剪。我们在原型系统Conkar中实现了算法,该系统可以应用于多个领域,例如药物发现。

著录项

  • 作者

    Zhou, Mo.;

  • 作者单位

    Indiana University.;

  • 授予单位 Indiana University.;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 149 p.
  • 总页数 149
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号