首页> 外文期刊>ACM transactions on computer systems >Deca: A Garbage Collection Optimizer for In-Memory Data Processing
【24h】

Deca: A Garbage Collection Optimizer for In-Memory Data Processing

机译:Deca:用于内存数据处理的垃圾收集优化器

获取原文
获取原文并翻译 | 示例

摘要

In-memory caching of intermediate data and active combining of data in shuffle buffers have been shown to be very effective in minimizing the recomputation and I/O cost in big data processing systems such as Spark and Flink. However, it has also been widely reported that these techniques would create a large amount of long-living data objects in the heap. These generated objects may quickly saturate the garbage collector, especially when handling a large dataset, and hence, limit the scalability of the system. To eliminate this problem, we propose a lifetime-based memory management framework, which, by automatically analyzing the user-defined functions and data types, obtains the expected lifetime of the data objects and then allocates and releases memory space accordingly to minimize the garbage collection overhead. In particular, we present Deca, 1 a concrete implementation of our proposal on top of Spark, which transparently decomposes and groups objects with similar lifetimes into byte arrays and releases their space altogether when their lifetimes come to an end. When systems are processing very large data, Deca also provides field-oriented memory pages to ensure high compression efficiency. Extensive experimental studies using both synthetic and real datasets show that, in comparing to Spark, Deca is able to (1) reduce the garbage collection time by up to 99.9%, (2) reduce the memory consumption by up to 46.6% and the storage space by 23.4%, (3) achieve 1.2x to 22.7x speedup in terms of execution time in cases without data spilling and 16x to 41.6x speedup in cases with data spilling, and (4) provide similar performance compared to domain-specific systems.
机译:中间数据的内存高速缓存和随机缓冲区中的数据中的主动组合在最小化如火花和传递之类的大数据处理系统中最小化重新计算和I / O成本方面非常有效。然而,它也得到了广泛报道,这些技术将在堆中创造大量的长生数据对象。这些生成的对象可能会快速饱和垃圾收集器,尤其是在处理大型数据集时,因此限制了系统的可扩展性。为了消除这个问题,我们提出了一种基于寿命的内存管理框架,它通过自动分析用户定义的函数和数据类型,获取数据对象的预期寿命,然后相应地分配和释放存储空间,以便最小化垃圾收集高架。特别是,我们展示了Deca,1关于我们在火花顶部的具体实施,这透明地分解并将具有与类似寿命的物体分解为字节阵列,并且当他们的寿命结束时完全释放它们的空间。当系统处理非常大的数据时,Deca还提供了面向现场的存储器页面,以确保高压缩效率。使用合成和实时数据集的广泛实验研究表明,与Spark相比,Deca能够将垃圾收集时间降低至99.9%,(2)将内存消耗降低至46.6%和存储在23.4%,(3)在没有数据溢出的情况下,在没有数据溢出的情况下,在执行时间和16倍到41.6倍的情况下,在数据溢出的情况下加速为1.2倍,(4)与域特定系统相比提供类似的性能。

著录项

  • 来源
    《ACM transactions on computer systems》 |2018年第1期|3.1-3.47|共47页
  • 作者单位

    Huazhong Univ Sci & Technol Serv Comp Technol & Syst Lab Big Data Technol & Syst Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Hubei Peoples R China;

    Huazhong Univ Sci & Technol Serv Comp Technol & Syst Lab Big Data Technol & Syst Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Hubei Peoples R China;

    Univ Copenhagen Dept Comp Sci DK-2100 Copenhagen Denmark;

    Huazhong Univ Sci & Technol Serv Comp Technol & Syst Lab Big Data Technol & Syst Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Hubei Peoples R China;

    Alibaba Grp Hangzhou Zhejiang Peoples R China;

    Huazhong Univ Sci & Technol Wuhan Hubei Peoples R China;

    Univ Warwick Dept Comp Sci Coventry CV4 7AL W Midlands England;

    Huazhong Univ Sci & Technol Wuhan Hubei Peoples R China;

    Huazhong Univ Sci & Technol Wuhan Hubei Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Data processing system; distributed system; garbage collection; in-memory; memory management;

    机译:数据处理系统;分布式系统;垃圾收集;内存;内存管理;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号