首页> 外文会议>4th ACM international workshop on data and text mining in bioinformatics 2010 >Processing SPARQL Queries with Regular Expressions in RDF Databases
【24h】

Processing SPARQL Queries with Regular Expressions in RDF Databases

机译:在RDF数据库中使用正则表达式处理SPARQL查询

获取原文
获取原文并翻译 | 示例

摘要

As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioin-formatics resources such as Uniprot (dev.isb-sib.ch/projects/ uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL - a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.
机译:由于资源描述框架(RDF)数据模型被广泛用于建模和共享许多在线生物信息资源,例如Uniprot(dev.isb-sib.ch/projects/ uniprot-rdf)或Bio2RDF(bio2rdf.org) SPARQL是RDF数据库的W3C推荐查询,已经成为查询生物信息学知识库的重要查询语言。此外,由于用户请求从RDF数据中提取信息的多样性以及用户对RDF数据库中每个事实的确切值的了解不足,因此希望将SPARQL查询与正则表达式模式一起使用用于查询RDF数据。据我们所知,目前还没有任何工作可以有效地支持RDF数据库上SPARQL中的正则表达式处理。大多数用于处理正则表达式的现有技术都设计用于查询文本语料库,或仅用于支持RDF图中路径的匹配。在本文中,我们提出了一个新颖的框架来支持SPARQL查询中的正则表达式处理。我们的贡献可以总结如下。 1)我们提出了一个有效的框架,用于在RDF数据库中使用正则表达式模式处理SPARQL查询。 2)我们提出了一种成本模型,以适应现有查询优化器中提出的框架。 3)我们使用C ++构建了所提出框架的原型,并进行了广泛的实验,证明了我们技术的效率和有效性。使用成熟的RDF引擎进行的实验表明,在使用正则表达式模式处理SPARQL查询时,我们的框架比现有框架的性能高出两个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号