...
首页> 外文期刊>Mathematical Problems in Engineering: Theory, Methods and Applications >Accurate Counting Bloom Filters for Large-Scale Data Processing
【24h】

Accurate Counting Bloom Filters for Large-Scale Data Processing

机译:用于大规模数据处理的精确计数布隆过滤器

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Bloom filters are space-efficient randomized data structures for fast membership queries, allowing false positives. Counting Bloom Filters (CBFs) perform the same operations on dynamic sets that can be updated via insertions and deletions. CBFs have been extensively used in MapReduce to accelerate large-scale data processing on large clusters by reducing the volume of datasets. The false positive probability of CBF should be made as low as possible for filtering out more redundant datasets. In this paper, we propose a multilevel optimization approach to building an Accurate Counting Bloom Filter (ACBF) for reducing the false positive probability. ACBF is constructed by partitioning the counter vector into multiple levels. We propose an optimized ACBF by maximizing the first level size, in order to minimize the false positive probability while maintaining the same functionality as CBF. Simulation results show that the optimized ACBF reduces the false positive probability by up to 98.4% at the same memory consumption compared to CBF. We also implement ACBFs in MapReduce to speed up the reduce-side join. Experiments on realistic datasets show that ACBF reduces the false positive probability by 72.3% as well as the map outputs by 33.9% and improves the join execution times by 20% compared to CBF.
机译:布隆过滤器是空间高效的随机数据结构,用于快速成员资格查询,可带来误报。计数布隆过滤器(CBF)对可通过插入和删除进行更新的动态集执行相同的操作。 CBF已在MapReduce中广泛使用,以通过减少数据集的数量来加速大型集群上的大规模数据处理。为了过滤出更多的冗余数据集,应使CBF的误报概率尽可能低。在本文中,我们提出了一种多级优化方法来构建一个准确计数的布隆滤波器(ACBF),以减少误报概率。通过将计数器向量划分为多个级别来构造ACBF。我们建议通过最大化第一级大小来优化ACBF,以便在保持与CBF相同的功能的同时,最小化误报概率。仿真结果表明,与CBF相比,在相同的内存消耗下,优化的ACBF最多可将误报率降低98.4%。我们还在MapReduce中实现ACBF,以加快减少端连接。在真实数据集上进行的实验表明,与CBF相比,ACBF将误报率降低了72.3%,将地图输出降低了33.9%,并将联接执行时间缩短了20%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号