首页> 外文会议>IEEE International Symposium on Parallel and Distributed Processing with Applications >A Workload-Specific Memory Capacity Configuration Approach for In-Memory Data Analytic Platforms
【24h】

A Workload-Specific Memory Capacity Configuration Approach for In-Memory Data Analytic Platforms

机译:用于内存数据分析平台的工作负载特定内存容量配置方法

获取原文

摘要

Nowadays, in-memory data analytic platforms, such as Spark, are widely adopted in big data processing. The proper memory capacity configuration has been proved to be an efficient way to guarantee the workload performance in such platforms. Currently, Spark adopts the static way to configure the memory capacity for workloads based on user specifications. However, due to the lack of deep knowledge of the target platform and workload characteristics, nonexpert users often conservatively configure the memory capacity in an excessive way, which reduces the memory utilization significantly. On the other hand, as the memory requirements are quite different among diverse workloads, there is not the one-size- fits-all solution for memory capacity configuration. Aiming on these issues, we propose WSMC, a workload-specific memory capacity configuration approach for the Spark workloads, which guides users on the memory capacity configuration with the accurate prediction of the workload's memory requirement under various input data size and parameter settings. First, WSMC classifies the in-memory computing workloads into four categories according to the workloads' Data Expansion Ratio. Second, WSMC establishes a memory requirement prediction model with the consideration of the input data size, the shuffle data size, the parallelism of the workloads and the data block size. For the ad-hoc workload, WSMC can profile its Data Expansion Ratio with small-sized input data and decide the category that the workload falls into. Users can then determine the accurate configuration in accordance with the corresponding memory requirement prediction.Through the comprehensive evaluations with SparkBench workloads, we found that, contrasting with the default configuration, configuration with the guide of WSMC can save over 40% memory capacity with the workload performance slight degradation (only 5%), and compared to the proper configuration found out manually, the configuration with the guide of WSMC leads to only 7% increase in the memory waste with the workload's performance slight improvement (about 1%).
机译:如今,在存储器内数据的分析平台,如火花,被广泛在大数据处理采用。适当的存储器容量的配置已经被证明是保证这样的平台的工作负载性能的有效方法。目前,星火采用静态的方式来配置根据用户的工作负载规格的内存容量。然而,由于缺乏对目标平台和工作负载特性很深的造诣,不熟练的用户往往保守配置过度的方式,这显著减少内存使用的内存容量。在另一方面,作为对内存的要求是多样化的工作负载中完全不同,不存在内存容量配置的一个一刀切的解决办法。针对这些问题,我们提出WSMC,为星火工作负载,工作负载特定的内存容量配置的办法,引导用户上的工作负载的内存需求下,各种输入数据的大小和参数设置准确预测存储容量配置。首先,WSMC根据工作负荷的数据膨胀率在内存中的工作负荷计算分类为四类。第二,WSMC建立与考虑输入数据的大小,混洗数据的大小,工作负荷的并行和数据块大小的存储器需求预测模型。对于临时工作量,WSMC可以分析与小型输入数据的数据膨胀率,并决定该类别的工作量落入。然后,用户可以决定在根据相应的存储器需求与prediction.Through SparkBench工作负荷的综合评价的精确配置,我们发现,与默认配置对比,与WSMC的引导结构可节省超过40与%的存储器容量工作负载性能略微下降(仅5 %),并与正确的配置手动发现了,与WSMC引线的引导结构仅7 %增加存储器废物与工作负载的性能略有改进(约1 % )。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号