首页> 外文会议>International Conference on Advanced Cloud and Big Data >A Bloom Filter-based Approach for Efficient MapReduce Query Processing on Ordered Datasets
【24h】

A Bloom Filter-based Approach for Efficient MapReduce Query Processing on Ordered Datasets

机译:基于绘制的基于筛选的方法,用于有序数据集的有效MapReduce查询处理

获取原文

摘要

The MapReduce processing framework is unaware of the property of underlying datasets. For ordered datasets (e.g., time-series data), in which records have been already sorted, MapReduce still performs unnecessary sorting operations during its execution. It directly results in a significant increase of execution time, as sorting a large volume of data is time-consuming. In this paper, we propose a bloom filter-based approach to improve the performance of MapReduce when processing ordered datasets. In our approach, all records are stored in a set of bloom filters after the Mapping phase and data queries can be efficiently processed by checking the bloom filters. Due to the high querying efficiency of bloom filters, we can achieve significant performance gain in the Reducing phase.We conduct a series of experiments to evaluate the effectiveness of our proposed bloom filter-based approach. Our experimental results show that our approach can achieve 2x speedup in terms of query processing performance, and reduce the CPU/memory utilization in the meanwhile. Moreover, we also evaluate the scalability of our proposed approach when processing multiple queries, and observe that the speedup can be further improved with the increasing number of queries.
机译:MapReduce处理框架是不知道底层数据集的属性。对于有序数据集(例如,时间序列数据),其中已经对其进行了排序,MapReduce仍在执行期间执行不必要的排序操作。它直接导致执行时间的显着增加,因为大量数据的排序是耗时的。在本文中,我们提出了一种基于筛选的基于筛选的方法来提高处理有序数据集时MapReduce的性能。在我们的方法中,通过检查绽放过滤器可以有效地处理映射阶段和数据查询之后,所有记录都存储在一组绽放过滤器中。由于盛开过滤器的高Querying效率,我们可以在还原阶段获得显着的性能增益。我们进行一系列实验来评估我们所提出的盛开滤波器的方法的有效性。我们的实验结果表明,我们的方法可以在查询处理性能方面实现2倍的加速,并同时降低CPU /内存利用率。此外,我们还评估在处理多个查询时所提出的方法的可扩展性,并观察到越来越多的查询可以进一步提高加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号