Deca: A Garbage Collection Optimizer for In-Memory Data Processing

Shi Xuanhua; Ke Zhixiang; Zhou Yongluan; Jin Hai; Lu Lu; Zhang Xiong; He Ligang; Hu Zhenyu; Wang Fei

首页> 外文期刊>ACM transactions on computer systems >Deca: A Garbage Collection Optimizer for In-Memory Data Processing

【24h】

Deca: A Garbage Collection Optimizer for In-Memory Data Processing

机译：Deca：用于内存数据处理的垃圾收集优化器

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In-memory caching of intermediate data and active combining of data in shuffle buffers have been shown to be very effective in minimizing the recomputation and I/O cost in big data processing systems such as Spark and Flink. However, it has also been widely reported that these techniques would create a large amount of long-living data objects in the heap. These generated objects may quickly saturate the garbage collector, especially when handling a large dataset, and hence, limit the scalability of the system. To eliminate this problem, we propose a lifetime-based memory management framework, which, by automatically analyzing the user-defined functions and data types, obtains the expected lifetime of the data objects and then allocates and releases memory space accordingly to minimize the garbage collection overhead. In particular, we present Deca, 1 a concrete implementation of our proposal on top of Spark, which transparently decomposes and groups objects with similar lifetimes into byte arrays and releases their space altogether when their lifetimes come to an end. When systems are processing very large data, Deca also provides field-oriented memory pages to ensure high compression efficiency. Extensive experimental studies using both synthetic and real datasets show that, in comparing to Spark, Deca is able to (1) reduce the garbage collection time by up to 99.9%, (2) reduce the memory consumption by up to 46.6% and the storage space by 23.4%, (3) achieve 1.2x to 22.7x speedup in terms of execution time in cases without data spilling and 16x to 41.6x speedup in cases with data spilling, and (4) provide similar performance compared to domain-specific systems.

机译：中间数据的内存高速缓存和随机缓冲区中的数据中的主动组合在最小化如火花和传递之类的大数据处理系统中最小化重新计算和I / O成本方面非常有效。然而，它也得到了广泛报道，这些技术将在堆中创造大量的长生数据对象。这些生成的对象可能会快速饱和垃圾收集器，尤其是在处理大型数据集时，因此限制了系统的可扩展性。为了消除这个问题，我们提出了一种基于寿命的内存管理框架，它通过自动分析用户定义的函数和数据类型，获取数据对象的预期寿命，然后相应地分配和释放存储空间，以便最小化垃圾收集高架。特别是，我们展示了Deca，1关于我们在火花顶部的具体实施，这透明地分解并将具有与类似寿命的物体分解为字节阵列，并且当他们的寿命结束时完全释放它们的空间。当系统处理非常大的数据时，Deca还提供了面向现场的存储器页面，以确保高压缩效率。使用合成和实时数据集的广泛实验研究表明，与Spark相比，Deca能够将垃圾收集时间降低至99.9％，（2）将内存消耗降低至46.6％和存储在23.4％，（3）在没有数据溢出的情况下，在没有数据溢出的情况下，在执行时间和16倍到41.6倍的情况下，在数据溢出的情况下加速为1.2倍，（4）与域特定系统相比提供类似的性能。

著录项

来源
《ACM transactions on computer systems》 |2018年第1期|3.1-3.47|共47页
作者
Shi Xuanhua; Ke Zhixiang; Zhou Yongluan; Jin Hai; Lu Lu; Zhang Xiong; He Ligang; Hu Zhenyu; Wang Fei;
展开▼
作者单位

Huazhong Univ Sci & Technol Serv Comp Technol & Syst Lab Big Data Technol & Syst Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Hubei Peoples R China;

Huazhong Univ Sci & Technol Serv Comp Technol & Syst Lab Big Data Technol & Syst Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Hubei Peoples R China;

Univ Copenhagen Dept Comp Sci DK-2100 Copenhagen Denmark;

Huazhong Univ Sci & Technol Serv Comp Technol & Syst Lab Big Data Technol & Syst Lab Sch Comp Sci & Technol 1037 Luoyu Rd Wuhan 430074 Hubei Peoples R China;

Alibaba Grp Hangzhou Zhejiang Peoples R China;

Huazhong Univ Sci & Technol Wuhan Hubei Peoples R China;

Univ Warwick Dept Comp Sci Coventry CV4 7AL W Midlands England;

Huazhong Univ Sci & Technol Wuhan Hubei Peoples R China;

Huazhong Univ Sci & Technol Wuhan Hubei Peoples R China;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Data processing system; distributed system; garbage collection; in-memory; memory management;

机译：数据处理系统;分布式系统;垃圾收集;内存;内存管理;

相似文献

外文文献
中文文献
专利

1. Deca: A Garbage Collection Optimizer for In-Memory Data Processing [J] . Shi Xuanhua, Ke Zhixiang, Zhou Yongluan, ACM transactions on computer systems . 2018,第1期

机译：Deca：用于内存中数据处理的垃圾收集优化器
2. Garbage in, Garbage Out: Data Collection, Quality Assessment and Reporting Standards for Social Media Data Use in Health Research, Infodemiology and Digital Disease Detection [J] . Yoonsang Kim, Jidong Huang, Sherry Emery Journal of medical Internet research . 2016,第2期

机译：垃圾进出：健康研究，信息流行病学和数字疾病检测中社交媒体数据使用的数据收集，质量评估和报告标准
3. An Efficient Data Migration Scheme to Optimize Garbage Collection in SSDs [J] . Wang Shunzhuo, Zhou You, Zhou Jiaona, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems . 2021,第3期

机译：一种有效的数据迁移方案，以优化SSD中的垃圾收集
4. Stark: Optimizing In-Memory Computing for Dynamic Dataset Collections [C] . Shen Li, Md Tanvir Amin, Raghu Ganti, IEEE International Conference on Distributed Computing Systems . 2017

机译：斯塔克：为动态数据集收集优化内存中的计算
5. The active memory processor: Hardware support for one-bit reference counting and mark-sweep garbage collection. [D] . Srisa-an, Witawas. 2002

机译：主动内存处理器：硬件支持一位参考计数和标记清除垃圾收集。
6. Data Processing and Information Classification—An In-Memory Approach [O] . Milena Andrighetti, Giovanna Turvani, Giulia Santoro, 2020

机译：数据处理和信息分类-内存中方法
7. Garbage in, Garbage Out: Data Collection, Quality Assessment and Reporting Standards for Social Media Data Use in Health Research, Infodemiology and Digital Disease Detection [O] . Yoonsang Kim, Jidong Huang, Sherry Emery 2016

机译：垃圾进入，垃圾出：数据收集，质量评估和社交媒体数据在卫生研究中使用的报告标准，信息化学和数字疾病检测
8. Real-time garbage collection for list processing using restructured cells for increased reference counter size [R] . 1990

机译：使用重组单元格进行列表处理的实时垃圾收集，以增加参考计数器大小

Deca: A Garbage Collection Optimizer for In-Memory Data Processing

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅