首页> 外文会议>International Symposium on Computing and Networking Workshops >Improving Apache Spark's Cache Mechanism with LRC-Based Method Using Bloom Filter
【24h】

Improving Apache Spark's Cache Mechanism with LRC-Based Method Using Bloom Filter

机译:使用Bloom Filter提高基于LRC的方法的Apache Spark的缓存机制

获取原文

摘要

Memory-and-Disk caching is a common caching mechanism for temporal output in Apache Spark. However, it causes performance degradation when memory usage has reached its limit because of the Spark's LRU (Least Recently Used) based cache management. Existing studies have reported that replacement of LRU-based cache mechanism to LRC (Least Reference Count) based one that is a more accurate indicator of the likelihood of future data access. However, frequently used partitions cannot be determined because Spark accesses all of partitions for user-driven RDD operations, even if partitions do not include necessary data. In this paper, we propose a cache management method that enables allocating necessary partitions to the memory by introducing the bloom filter into existing methods. The bloom filter prevents unnecessary partitions from being processed because partitions are checked whether required data is contained. Furthermore, frequently used partitions can be properly determined by measuring the reference count of partitions. We implemented two architecture types, the driver-side bloom filter and the executor-side bloom filter, to consider the optimal place of the bloom filter. Evaluation results showed that the execution time of the driver-side implementation was reduced by 89% in a filter-test benchmark based on the LRC-based method.
机译:内存和磁盘缓存是Apache Spark中的时间输出的常见缓存机制。但是,由于Spark的LRU(最近使用最近使用的)的缓存管理,内存使用情况达到了极限时,它会导致性能下降。现有的研究报告称,将基于LRU的高速缓存机制替换为基于LRC(最少参考计数),这是一个更准确的指标的未来数据访问的可能性。然而,即使分区不包括必要的数据,也不能确定常用的分区,因为火花访问用户驱动的RDD操作的所有分区。在本文中,我们提出了一种缓存管理方法,可以通过将绽放过滤器引入现有方法来分配给存储器的必要分区。绽放过滤器可防止处理不必要的分区,因为检查是否包含所需数据。此外,可以通过测量分区的参考计数来适当地确定常用的分区。我们实现了两个架构类型,驾驶员侧绘制过滤器和执行程序侧绽放过滤器,以考虑盛开过滤器的最佳位置。评估结果表明,基于基于LRC的方法的滤波器测试基准,驾驶员侧实现的执行时间减少了89%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号