首页> 外文会议>19th international world wide web conference 2010 >LCA-based Selection for XML Document Collections
【24h】

LCA-based Selection for XML Document Collections

机译:基于LCA的XML文档集合选择

获取原文

摘要

In this paper, we address the problem of database selection for XML document collections, that is, given a set of collections and a user query, how to rank the collections based on their goodness to the query. Goodness is determined by the relevance of the documents in the collection to the query. We consider keyword queries and support Lowest Common Ancestor (LCA) semantics for defining query results, where the relevance of each document to a query is determined by properties of the LCA of those nodes in the XML document that contain the query keywords. To avoid evaluating queries against each document in a collection, we propose maintaining in a preprocessing phase, information about the LCAs of all pairs of keywords in a document and use it to approximate the properties of the LCA-based results of a query. To improve storage and processing efficiency, we use appropriate summaries of the LCA information based on Bloom filters. We address both a boolean and a weighted version of the database selection problem. Our experimental results show that our approach incurs low errors in the estimation of the goodness of a collection and provides rankings that are very close to the actual ones.
机译:在本文中,我们解决了XML文档集合的数据库选择问题,即在给定集合集合和用户查询的情况下,如何根据集合对查询的优劣来对集合进行排序。优劣取决于集合中文档与查询的相关性。我们考虑使用关键字查询,并支持用于定义查询结果的最低公共祖先(LCA)语义,其中每个文档与查询的相关性由包含查询关键字的XML文档中那些节点的LCA的属性确定。为了避免评估对集合中每个文档的查询,我们建议在预处理阶段维护有关文档中所有关键字对的LCA的信息,并使用它来近似基于LCA的查询结果的属性。为了提高存储和处理效率,我们使用基于Bloom过滤器的LCA信息的适当摘要。我们解决了数据库选择问题的布尔值和加权版本。我们的实验结果表明,我们的方法在评估集合的优劣时不会产生太大的错误,并且所提供的排名与实际排名非常接近。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号