首页> 外文OA文献 >Lifetime-Based Memory Management for Distributed Data Processing Systems

【2h】

Lifetime-Based Memory Management for Distributed Data Processing Systems

机译：分布式数据处理系统基于生命周期的内存管理

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In-memory caching of intermediate data and eager combining of data in shuffle buffers have been shown to be very effective in minimizing the re-computation and I/O cost in distributed data processing systems like Spark and Flink. However, it has also been widely reported that these techniques would create a large amount of long-living data objects in the heap, which may quickly saturate the garbage collector, especially when handling a large dataset, and hence would limit the scalability of the system. To eliminate this problem, we propose a lifetime-based memory management framework, which, by automatically analyzing the user-defined functions and data types, obtains the expected lifetime of the data objects, and then allocates and releases memory space accordingly to minimize the garbage collection overhead. In particular, we present Deca, a concrete implementation of our proposal on top of Spark, which transparently decomposes and groups objects with similar lifetimes into byte arrays and releases their space altogether when their lifetimes come to an end. An extensive experimental study using both synthetic and real datasets shows that, in comparing to Spark, Deca is able to 1) reduce the garbage collection time by up to 99.9%, 2) to achieve up to 22.7x speed up in terms of execution time in cases without data spilling and 41.6x speedup in cases with data spilling, and 3) to consume up to 46.6% less memory.

机译：中间数据的内存中缓存和数据在混洗缓冲区中的渴望组合在减少Spark和Flink等分布式数据处理系统中的重新计算和I / O成本方面表现出了非常有效的作用。但是，据广泛报道，这些技术会在堆中创建大量的长期数据对象，这可能会使垃圾收集器快速饱和，尤其是在处理大型数据集时，因此会限制系统的可伸缩性。。为了消除此问题，我们提出了一种基于生命周期的内存管理框架，该框架通过自动分析用户定义的函数和数据类型，获得数据对象的预期寿命，然后相应地分配和释放内存空间，以最大程度地减少垃圾收集开销。特别是，我们介绍了Deca，这是我们在Spark之上的建议的具体实现，它透明地将具有相似生存期的对象分解和分组为字节数组，并在生存期结束时完全释放它们的空间。使用合成数据集和实际数据集进行的广泛实验研究表明，与Spark相比，Deca能够（1）将垃圾收集时间减少多达99.9％，2）在执行时间方面实现高达22.7倍的加速在没有数据泄漏的情况下，以及在有数据泄漏的情况下，速度提高了41.6倍； 3）消耗的内存减少了46.6％。

著录项

作者
Lu Lu; Shi Xuanhua; Zhou Yongluan; Zhang Xiong; Jin Hai; Pei Cheng; He Ligang; Geng Yuanzhen;
展开▼
作者单位

展开▼
年度 2016
总页数
原文格式 PDF
正文语种 eng
中图分类

相似文献

外文文献
中文文献
专利

1. 基于多卫星分布式数据处理系统的高分三号卫星数据实时处理方法 [J] . 杨军, 曹筵东, 孙光才, 中南大学学报（英文版） . 2020,第003期
2. Evaluation of SQL benchmark for distributed in-memory Database Management Systems [J] . Oleg Borisenko, David Badalyan International journal of computer science and network security . 2018,第10期

机译：评估分布式内存数据库管理系统的SQL基准
3. Efficient distance join query processing in distributed spatial data management systems [J] . Information Sciences: An International Journal . 2020,第期

机译：分布式空间数据管理系统中的高效距离连接查询处理
4. Transaction Processing and Management in Distributed Database Systems [J] . International Journal of Computer Science and Technology . 2011,第3期

机译：分布式数据库系统中的事务处理和管理
5. Lifetime-Based Memory Management for Distributed Data Processing Systems [C] . Lu Lu, Xuanhua Shi, Yongluan Zhou, International conference on very large data bases . 2016

机译：分布式数据处理系统基于生命周期的内存管理
6. Data management in distributed stream processing systems. [D] . Vijayakumar, Nithya Nirmal. 2007

机译：分布式流处理系统中的数据管理。
7. Clinical Laboratory Data Management: A Distributed Data Processing Solution [O] . Martin Levin, Raymond Morgner, Bernice Packer 1980

机译：临床实验室数据管理：分布式数据处理解决方案
8. Lifetime-Based Memory Management for Distributed Data Processing Systems [O] . Lu, Lu, Shi, Xuanhua, Zhou, Yongluan, 2016

机译：基于生命周期的分布式数据处理系统内存管理
9. Research in Functionally Distributed Computer Systems Development. Volume IX. Memory Management in a Distributed Data Base Management System [R] . Maryanski, F. J., Wallentine, V. 1976

机译：功能分布式计算机系统开发研究。第九卷。分布式数据库管理系统中的内存管理

Lifetime-Based Memory Management for Distributed Data Processing Systems

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅