A Workload-Specific Memory Capacity Configuration Approach for In-Memory Data Analytic Platforms

机译：内存中数据分析平台的特定于工作负载的内存容量配置方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Nowadays, in-memory data analytic platforms, such as Spark, are widely adopted in big data processing. The proper memory capacity configuration has been proved to be an efficient way to guarantee the workload performance in such platforms. Currently, Spark adopts the static way to configure the memory capacity for workloads based on user specifications. However, due to the lack of deep knowledge of the target platform and workload characteristics, nonexpert users often conservatively configure the memory capacity in an excessive way, which reduces the memory utilization significantly. On the other hand, as the memory requirements are quite different among diverse workloads, there is not the one-size- fits-all solution for memory capacity configuration. Aiming on these issues, we propose WSMC, a workload-specific memory capacity configuration approach for the Spark workloads, which guides users on the memory capacity configuration with the accurate prediction of the workload's memory requirement under various input data size and parameter settings. First, WSMC classifies the in-memory computing workloads into four categories according to the workloads' Data Expansion Ratio. Second, WSMC establishes a memory requirement prediction model with the consideration of the input data size, the shuffle data size, the parallelism of the workloads and the data block size. For the ad-hoc workload, WSMC can profile its Data Expansion Ratio with small-sized input data and decide the category that the workload falls into. Users can then determine the accurate configuration in accordance with the corresponding memory requirement prediction.Through the comprehensive evaluations with SparkBench workloads, we found that, contrasting with the default configuration, configuration with the guide of WSMC can save over 40% memory capacity with the workload performance slight degradation (only 5%), and compared to the proper configuration found out manually, the configuration with the guide of WSMC leads to only 7% increase in the memory waste with the workload's performance slight improvement (about 1%).

机译：如今，内存数据分析平台（例如Spark）已在大数据处理中被广泛采用。事实证明，正确的内存容量配置是保证此类平台上工作负载性能的有效方法。目前，Spark采用静态方式根据用户规范为工作负载配置内存容量。但是，由于缺乏对目标平台和工作负载特性的深入了解，因此非专家用户经常以过分的方式保守地配置内存容量，从而显着降低了内存利用率。另一方面，由于不同的工作负载之间的内存要求差异很大，因此没有一种适用于所有容量的解决方案。针对这些问题，我们提出了WSMC，一种针对Spark工作负载的工作负载特定的内存容量配置方法，该方法可指导用户进行内存容量配置，并在各种输入数据大小和参数设置下准确预测工作负载的内存需求。首先，WSMC根据工作负载的数据扩展率将其分为四个类别。其次，WSMC在考虑输入数据大小，混洗数据大小，工作负载的并行性和数据块大小的基础上建立内存需求预测模型。对于临时工作负载，WSMC可以使用小型输入数据来描述其数据扩展率，并确定工作负载所属的类别。然后用户可以根据相应的内存需求预测确定准确的配置。通过对SparkBench工作负载的综合评估，我们发现，与默认配置相比，使用WSMC指导进行配置可以节省40％以上的内存容量。工作负载性能略有下降（仅为5％），与手动找到的正确配置相比，采用WSMC指导的配置导致内存浪费仅增加了7％，工作负载的性能略有提高（大约1 \\％）。

著录项

来源
《15th IEEE International Symposium on Parallel and Distributed Processing with Applications and 16th IEEE International Conference on Ubiquitous Computing and Communications》|2017年|486-490|共5页
会议地点 Guangzhou(CN)
作者
Yi Liang; Shilu Chang; Chao Su;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Memory management; Sparks; Predictive models; Task analysis; Data models; Parallel processing; Data analysis;

机译：内存管理;火花;预测模型;任务分析;数据模型;并行处理;数据分析;;

相似文献

外文文献
中文文献
专利

1. Optimizing the Analytical Value of Oncology-Related Data Based on an In-Memory Analysis Layer: Development and Assessment of the Munich Online Comprehensive Cancer Analysis Platform [J] . Daniel Nasseh, Sophie Schneiderbauer, Michael Lange, Journal of medical Internet research . 2020,第4期

机译：基于内存分析层优化义科相关数据的分析价值：慕尼黑在线综合癌症分析平台的开发和评估
2. Datasize-Aware High Dimensional Configurations Auto-Tuning of In-Memory Cluster Computing [J] . Zhibin Yu, Zhendong Bei, Xuehai Qian ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2018,第2期

机译：数据化感知高维配置内存内存集的自动调整
3. Eager Memory Management for In-Memory Data Analytics [J] . Hakbeom JANG, Jonghyun BAE, Tae Jun HAM, IEICE transactions on information and systems . 2019,第3期

机译：渴望内存管理，用于内存中数据分析
4. A Workload-Specific Memory Capacity Configuration Approach for In-Memory Data Analytic Platforms [C] . Yi Liang, Shilu Chang, Chao Su IEEE International Symposium on Parallel and Distributed Processing with Applications . 2017

机译：用于内存数据分析平台的工作负载特定内存容量配置方法
5. Understanding Memory Configurations for In-Memory Analytics. [D] . Reiss, Charles Albert. 2016

机译：了解内存分析的内存配置。
6. Data Processing and Information Classification—An In-Memory Approach [O] . Milena Andrighetti, Giovanna Turvani, Giulia Santoro, 2020

机译：数据处理和信息分类-内存中方法
7. A Workload-Specific Memory Capacity Configuration Approach for In-Memory Data Analytic Platforms [O] . Liang, Yi, Chang, Shilu, Su, Chao 2017

机译：内存中的特定于工作负载的内存容量配置方法数据分析平台

A Workload-Specific Memory Capacity Configuration Approach for In-Memory Data Analytic Platforms

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅