首页> 外文期刊>ACM Computing Surveys >A Study on Garbage Collection Algorithms for Big Data Environments
【24h】

A Study on Garbage Collection Algorithms for Big Data Environments

机译:大数据环境下的垃圾收集算法研究

获取原文
获取原文并翻译 | 示例

摘要

The need to process and store massive amounts of data-Big Data-is a reality. In areas such as scientific experiments, social networks management, credit card fraud detection, targeted advertisement, and financial analysis, massive amounts of information are generated and processed daily to extract valuable, summarized information. Due to its fast development cycle (i.e., less expensive to develop), mainly because of automatic memory management, and rich community resources, managed object-oriented programming languages (e.g., Java) are the first choice to develop Big Data platforms (e.g., Cassandra, Spark) on which such Big Data applications are executed.However, automatic memory management comes at a cost. This cost is introduced by the garbage collector, which is responsible for collecting objects that are no longer being used. Although current (classic) garbage collection algorithms may be applicable to small-scale applications, these algorithms are not appropriate for large-scale Big Data environments, as they do not scale in terms of throughput and pause times.In this work, current Big Data platforms and their memory profiles are studied to understand why classic algorithms (which are still the most commonly used) are not appropriate, and also to analyze recently proposed and relevant memory management algorithms, targeted to Big Data environments. The scalability of recent memory management algorithms is characterized in terms of throughput (improves the throughput of the application) and pause time (reduces the latency of the application) when compared to classic algorithms. The study is concluded by presenting a taxonomy of the described works and some open problems, with regard to Big Data memory management, that could be addressed in future works.
机译:处理和存储大量数据(大数据)的需求已成为现实。在科学实验,社交网络管理,信用卡欺诈检测,目标广告和财务分析等领域,每天都会生成并处理大量信息,以提取有价值的摘要信息。由于其快速的开发周期(即,开发成本较低)(主要是由于自动内存管理和丰富的社区资源),托管的面向对象编程语言(例如Java)是开发大数据平台(例如,执行此类大数据应用程序的Cassandra,Spark),但是自动内存管理需要付出一定的代价。该成本由垃圾收集器引入,垃圾收集器负责收集不再使用的对象。尽管当前的(经典)垃圾收集算法可能适用于小型应用程序,但是这些算法不适用于大规模的大数据环境,因为它们无法在吞吐量和暂停时间方面进行扩展。对平台及其内存配置文件进行了研究,以了解为何经典算法(仍是最常用的算法)不合适的原因,并分析了针对大数据环境的最近提出的相关内存管理算法。与经典算法相比,最新的内存管理算法的可伸缩性以吞吐量(提高应用程序的吞吐量)和暂停时间(减少应用程序的延迟)为特征。通过提出上述工作的分类法和有关大数据内存管理的一些未解决的问题,可以完成本研究的结论,这些问题可以在以后的工作中解决。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号