【24h】

A Workload-Specific Memory Capacity Configuration Approach for In-Memory Data Analytic Platforms

机译:内存中数据分析平台的特定于工作负载的内存容量配置方法

获取原文
获取原文并翻译 | 示例

摘要

Nowadays, in-memory data analytic platforms, such as Spark, are widely adopted in big data processing. The proper memory capacity configuration has been proved to be an efficient way to guarantee the workload performance in such platforms. Currently, Spark adopts the static way to configure the memory capacity for workloads based on user specifications. However, due to the lack of deep knowledge of the target platform and workload characteristics, nonexpert users often conservatively configure the memory capacity in an excessive way, which reduces the memory utilization significantly. On the other hand, as the memory requirements are quite different among diverse workloads, there is not the one-size- fits-all solution for memory capacity configuration. Aiming on these issues, we propose WSMC, a workload-specific memory capacity configuration approach for the Spark workloads, which guides users on the memory capacity configuration with the accurate prediction of the workload's memory requirement under various input data size and parameter settings. First, WSMC classifies the in-memory computing workloads into four categories according to the workloads' Data Expansion Ratio. Second, WSMC establishes a memory requirement prediction model with the consideration of the input data size, the shuffle data size, the parallelism of the workloads and the data block size. For the ad-hoc workload, WSMC can profile its Data Expansion Ratio with small-sized input data and decide the category that the workload falls into. Users can then determine the accurate configuration in accordance with the corresponding memory requirement prediction.Through the comprehensive evaluations with SparkBench workloads, we found that, contrasting with the default configuration, configuration with the guide of WSMC can save over 40% memory capacity with the workload performance slight degradation (only 5%), and compared to the proper configuration found out manually, the configuration with the guide of WSMC leads to only 7% increase in the memory waste with the workload's performance slight improvement (about 1%).
机译:如今,内存数据分析平台(例如Spark)已在大数据处理中被广泛采用。事实证明,正确的内存容量配置是保证此类平台上工作负载性能的有效方法。目前,Spark采用静态方式根据用户规范为工作负载配置内存容量。但是,由于缺乏对目标平台和工作负载特性的深入了解,因此非专家用户经常以过分的方式保守地配置内存容量,从而显着降低了内存利用率。另一方面,由于不同的工作负载之间的内存要求差异很大,因此没有一种适用于所有容量的解决方案。针对这些问题,我们提出了WSMC,一种针对Spark工作负载的工作负载特定的内存容量配置方法,该方法可指导用户进行内存容量配置,并在各种输入数据大小和参数设置下准确预测工作负载的内存需求。首先,WSMC根据工作负载的数据扩展率将其分为四个类别。其次,WSMC在考虑输入数据大小,混洗数据大小,工作负载的并行性和数据块大小的基础上建立内存需求预测模型。对于临时工作负载,WSMC可以使用小型输入数据来描述其数据扩展率,并确定工作负载所属的类别。然后用户可以根据相应的内存需求预测确定准确的配置。通过对SparkBench工作负载的综合评估,我们发现,与默认配置相比,使用WSMC指导进行配置可以节省40%以上的内存容量。工作负载性能略有下降(仅为5%),与手动找到的正确配置相比,采用WSMC指导的配置导致内存浪费仅增加了7%,工作负载的性能略有提高(大约1 \\%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号