首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries
【24h】

Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries

机译:基于帕累托的优势图:一种有效的索引结构来回答前K个查询

获取原文
获取原文并翻译 | 示例

摘要

Given a record set D and a query score function F, a top-k query returns k records from D, whose values of function F on their attributes are the highest. In this paper, we investigate the intrinsic connection between top-k queries and dominant relationships between records, and based on which, we propose an efficient layer-based indexing structure, Pareto-Based Dominant Graph (DG), to improve the query efficiency. Specifically, DG is built offline to express the dominant relationship between records and top-k query is implemented as a graph traversal problem, i.e., Traveler algorithm. We prove theoretically that the size of search space (that is the number of retrieved records from the record set to answer top-k query) in our algorithm is directly related to the cardinality of skyline points in the record set (see Theorem 3). Considering I/O cost, we propose cluster-based storage schema to reduce I/O cost in Traveler algorithm. We also propose the cost estimation methods in this paper. Based on cost analysis, we propose an optimization technique, pseudorecord, to further improve the search efficiency. In order to handle the top-k query in the high-dimension record set, we also propose N-Way Traveler algorithm. In order to handle DG maintenance efficiently, we propose ȁC;InsertionȁD; and ȁC;DeletionȁD; algorithms for DG. Finally, extensive experiments demonstrate that our proposed methods have significant improvement over its counterparts, including both classical and state art of top-k algorithms.
机译:给定一个记录集D和一个查询得分函数F,前k个查询从D返回k条记录,这些记录在其属性中函数F的值最高。在本文中,我们研究了前k个查询与记录之间的主导关系之间的内在联系,并在此基础上,提出了一种有效的基于层的索引结构,即基于Pareto的优势图(DG),以提高查询效率。具体而言,DG是脱机构建的,用于表达记录之间的主导关系,而top-k查询则作为图遍历问题(即Traveler算法)实现。从理论上讲,我们的算法证明了搜索空间的大小(即从记录集中检索的记录以回答top-k查询的数量)与记录集中天际点的基数直接相关(请参见定理3)。考虑到I / O成本,我们提出了基于集群的存储方案以降低Traveler算法的I / O成本。我们还提出了本文的成本估算方法。基于成本分析,我们提出了一种优化技术伪记录,以进一步提高搜索效率。为了处理高维记录集中的前k个查询,我们还提出了N-Way Traveler算法。为了有效地处理DG维护,我们建议ȁC;InsertionȁD;和ȁC;删除ȁD; DG的算法。最后,大量实验表明,我们提出的方法相对于同类方法(包括传统和top-k算法的最新技术)具有明显的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号