...
首页> 外文期刊>The VLDB journal >Optimal algorithms for selecting top-k combinations of attributes: theory and applications
【24h】

Optimal algorithms for selecting top-k combinations of attributes: theory and applications

机译:选择属性的前k个组合的最佳算法:理论和应用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Traditional top-k algorithms, e.g., TA and NRA, have been successfully applied in many areas such as information retrieval, data mining and databases. They are designed to discover k objects, e.g., top-k restaurants, with highest overall scores aggregated from different attributes, e.g., price and location. However, new emerging applications like query recommendation require providing the best combinations of attributes, instead of objects. The straightforward extension based on the existing top-k algorithms is prohibitively expensive to answer top-k combinations because they need to enumerate all the possible combinations, which is exponential to the number of attributes. In this article, we formalize a novel type of top-k query, called top-k, m, which aims to find top-k combinations of attributes based on the overall scores of the top-m objects within each combination, where m is the number of objects forming a combination. We propose a family of efficient top-k, m algorithms with different data access methods, i.e., sorted accesses and random accesses and different query certainties, i.e., exact query processing and approximate query processing. Theoretically, we prove that our algorithms are instance optimal and analyze the bound of the depth of accesses. We further develop optimizations for efficient query evaluation to reduce the computational and the memory costs and the number of accesses. We provide a case study on the real applications of top-k, m queries for an online biomedical search engine. Finally, we perform comprehensive experiments to demonstrate the scalability and efficiency of top-k, m algorithms on multiple real-life datasets.
机译:传统的top-k算法(例如TA和NRA)已成功应用于许多领域,例如信息检索,数据挖掘和数据库。它们旨在发现k个对象,例如顶级k餐馆,它们从不同的属性(例如价格和位置)中获得最高的总得分。但是,诸如查询推荐之类的新兴应用程序要求提供属性的最佳组合,而不是对象。基于现有的top-k算法的直接扩展方法对于回答top-k组合非常昂贵,因为它们需要枚举所有可能的组合,这与属性的数量成指数关系。在本文中,我们正式确定了一种新颖的top-k查询类型,称为top-k,m,该查询的目的是根据每种组合中top-m个对象的总体得分来查找top-k个属性组合。形成组合的对象数。我们提出了一系列有效的top-k,m算法,它们具有不同的数据访问方法,即排序访问和随机访问以及不同的查询确定性,即精确查询处理和近似查询处理。从理论上讲,我们证明了我们的算法是实例最优的,并分析了访问深度的界限。我们进一步开发了用于有效查询评估的优化,以减少计算和内存成本以及访问次数。我们提供了一个在线生物医学搜索引擎的前k个,m个查询的实际应用的案例研究。最后,我们进行了全面的实验,以证明top-k,m算法在多个真实数据集上的可扩展性和效率。

著录项

  • 来源
    《The VLDB journal》 |2018年第1期|27-52|共26页
  • 作者单位

    Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA;

    Univ Helsinki, Dept Comp Sci, Helsinki, Finland;

    Renmin Univ China, Sch Informat, Beijing, Peoples R China;

    Univ Calif San Diego, Dept Comp Sci & Engn, San Diego, CA 92103 USA;

    Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore;

  • 收录信息 美国《科学引文索引》(SCI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Top-k query; Top-k m query; Instance optimal algorithm;

    机译:Top-k查询;Top-k m查询;实例最优算法;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号