首页> 外文期刊>Distributed and Parallel Databases >A parallel computation of skyline using multiple regression analysis-based filtering on MapReduce
【24h】

A parallel computation of skyline using multiple regression analysis-based filtering on MapReduce

机译:在MapReduce上使用基于多重回归分析的滤波并行计算天际线

获取原文
获取原文并翻译 | 示例

摘要

In the last decade, skyline query processing has become widely important because of its usefulness in decision making applications. Since the size of the datasets used for skyline query processing are huge, algorithms for MapReduce-based skyline query processing have been widely studied. However, existing algorithms suffer from low-filtering efficiency for local skyline computation, and unrealistically assume both uniform data distributions and dimensional independence. In this paper, we propose a parallel skyline query processing algorithm for MapReduce using multiple regression analysis. The goal of our algorithm is to efficiently find a set of skylines from a large dataset by reducing the number of candidates prior to the skyline computation. To develop the skyline computation algorithm on anti-correlated datasets, we computed a data filtering threshold line based on a multiple regression analysis of the sampled dataset. To guarantee the accuracy of the skyline result, we considered both a filtering threshold line and a grid-based cell dominance condition. Thus, only relevant data could be computed in the real skyline computation step. For local skyline computation, we utilized an angle-based partitioning of data space that effectively eliminates non-promising points in partitions. For the global skyline computation, we used the dominance relationship among grid-based partitions to prune out unnecessary skyline points. Performance analyses showed that our parallel skyline query processing algorithm outperformed existing algorithms, under various settings.
机译:在过去的十年中,天际线查询处理因其在决策应用程序中的有用性而变得非常重要。由于用于天际线查询处理的数据集的规模巨大,因此已经广泛研究了基于MapReduce的天际线查询处理的算法。但是,现有算法的局部天际线计算的滤波效率低,并且不切实际地假设数据分布均匀且尺寸独立。在本文中,我们使用多元回归分析提出了MapReduce的并行天际线查询处理算法。我们算法的目标是通过减少天际线计算之前的候选数量,从大型数据集中有效地找到一组天际线。为了在反相关数据集上开发天际线计算算法,我们基于对采样数据集的多元回归分析计算了数据过滤阈值线。为了保证天际线结果的准确性,我们同时考虑了过滤阈值线和基于网格的单元优势条件。因此,在实际的天际线计算步骤中只能计算相关数据。对于本地天际线计算,我们利用了基于角度的数据空间分区,该分区有效地消除了分区中没有希望的点。对于全局天际线计算,我们使用基于网格的分区之间的优势关系来修剪不必要的天际线点。性能分析表明,在各种设置下,我们的并行天际线查询处理算法均优于现有算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号