首页> 外文会议>Data Engineering, ICDE, 2009 IEEE 25th International Conference on >BinRank: Scaling Dynamic Authority-Based Search Using Materialized SubGraphs
【24h】

BinRank: Scaling Dynamic Authority-Based Search Using Materialized SubGraphs

机译:BinRank:使用物化子图扩展基于动态权限的搜索

获取原文

摘要

Dynamic authority-based keyword search algorithms, such as ObjectRank and personalized PageRank, leverage semantic link information to provide high quality, high recall search in databases and the Web. Conceptually, these algorithms require a query-time PageRank-style iterative computation over the full graph. This computation is too expensive for large graphs, and not feasible at query time. Alternatively, building an index of pre-computed results for some or all keywords involves very expensive preprocessing. We introduce BinRank, a system that approximates ObjectRank results by utilizing a hybrid approach inspired by materialized views in traditional query processing. We materialize a number of relatively small subsets of the data graph in such a way that any keyword query can be answered by running ObjectRank on only one of the sub-graphs. BinRank generates the sub-graphs by partitioning all the terms in the corpus based on their co-occurrence, executing ObjectRank for each partition using the terms to generate a set of random walk starting points, and keeping only those objects that receive non-negligible scores. The intuition is that a sub-graph that contains all objects and links relevant to a set of related terms should have all the information needed to rank objects with respect to one of these terms. We demonstrate that BinRank can achieve sub-second query execution time on the English Wikipedia dataset, while producing high quality search results that closely approximate the results of ObjectRank on the original graph. The Wikipedia link graph contains about 10^8 edges, which is at least two orders of magnitude larger than what prior state of the art dynamic authority-based search systems have been able to demonstrate. Our experimental evaluation investigates the trade-off between query execution time, quality of the results, and storage requirements of BinRank.
机译:基于动态权限的关键字搜索算法(例如ObjectRank和个性化PageRank)利用语义链接信息在数据库和Web中提供高质量,高召回率的搜索。从概念上讲,这些算法需要在整个图形上进行查询时的PageRank样式的迭代计算。这种计算对于大型图形而言过于昂贵,并且在查询时不可行。或者,为某些或所有关键字建立预先计算结果的索引会涉及非常昂贵的预处理。我们引入BinRank,该系统通过利用传统查询处理中受实例化视图启发的混合方法来近似ObjectRank结果。我们以这种方式具体化了数据图的一些相对较小的子集,从而可以通过仅在一个子图上运行ObjectRank来回答任何关键字查询。 BinRank通过根据语料库的共现性对语料库中的所有术语进行划分来生成子图,使用这些术语为每个分区执行ObjectRank以生成一组随机行走起点,并仅保留那些接收到不可忽略分数的对象。直觉是,包含与一组相关术语相关的所有对象和链接的子图应具有针对这些术语之一对对象进行排名所需的所有信息。我们证明BinRank可以在英语Wikipedia数据集上实现亚秒级的查询执行时间,同时产生与原始图上的ObjectRank结果非常接近的高质量搜索结果。 Wikipedia链接图包含大约10 ^ 8个边,比现有的基于动态权限的搜索系统能够展示的现有状态至少大两个数量级。我们的实验评估调查了查询执行时间,结果质量和BinRank的存储需求之间的权衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号