首页> 外文期刊>Distributed and Parallel Databases >Pruning techniques for parallel processing of reverse top-k queries
【24h】

Pruning techniques for parallel processing of reverse top-k queries

机译:反向顶级k查询的并行处理修剪技术

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we address the problem of processing reverse top-k queries in a parallel setting. Given a database of objects, a set of user preferences, and a query object q, the reverse top-k query returns the subset of user preferences for which the query object belongs to the top-k results. Although recently the reverse top-k query operator has been studied extensively, its CPU-intensive nature results in prohibitively expensive processing cost, when applied on vast-sized data sets. This limitation motivates us to explore a scalable parallel processing solution, in order to enable reverse top-k processing over distributed large sets of input data in reasonable execution time. We present an algorithmic framework for the problem, in which different algorithms can be instantiated, targeting a generic parallel setting. We describe a parallel algorithm (DiPaRT) that exploits basic pruning properties and is provably correct, as an instantiation of the framework. Furthermore, we introduce novel pruning properties for the problem, and propose DiPaRT+ as another instance of the algorithmic framework, which offers improved efficiency and scales gracefully. All algorithms are implemented in MapReduce, and we provide a wide set of experiments that demonstrate the improved efficiency of DiPaRT+ using data sets that are four orders of magnitude larger than those handled by centralized approaches.
机译:在本文中,我们解决了并行设置中处理反向顶-K查询的问题。给定对象的数据库,一组用户首选项和查询对象Q,反向Top-k查询返回Query对象所属的用户首选项的子集。尽管最近,反向Top-K查询操作员已经广泛研究,但在庞大的数据集上应用时,其CPU密集型性质导致昂贵的处理成本。该限制使我们探讨了可伸缩的并行处理解决方案,以便在合理的执行时间下使REVERE TOP-K处理在分布式大型输入数据上通过分布式大量输入数据进行处理。我们为问题提出了一种算法框架,其中可以实例化不同的算法,针对通用并行设置。我们描述了一种并行算法(DIPART),可利用基本修剪特性,并且是可证实的,作为框架的实例化。此外,我们为该问题引入了新颖的修剪特性,并提出了Dipart +作为算法框架的另一个例子,它可以优雅地提供提高的效率和秤。所有算法都是在MapReduce中实现的,我们提供了一系列实验,该实验展示了Dipart +的提高效率,这些数据集是由集中方法处理的四个数量级的四个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号