改进的Spark Shuffle内存分配算法

侯伟凡; 樊玮; 张宇翔

首页> 中文期刊> 《计算机应用》 >改进的Spark Shuffle内存分配算法

改进的Spark Shuffle内存分配算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Shuffle性能是影响大数据集群性能的重要指标,Spark自身的Shuffle内存分配算法试图为内存池中的每一个Task平均分配内存,但是在实验中发现,由于各Task对于内存需求的不均衡导致了内存的浪费和运行效率较低的问题.针对上述问题,提出一种改进的Spark Shuffle内存分配算法.该算法根据Task的内存申请量和历史运行数据将Task按内存需求分为大小两类,对小内存需求型Task作“分割化”处理,对大内存需求型Task基于Task溢出次数和溢出后等待时间分配内存.该算法充分利用内存池的空闲内存,可以在数据倾斜导致的Task内存需求不均衡的情况下进行Task内存分配的自适应调节.实验结果表明,改进后算法较原算法降低了Task的溢出率,减少了Task的周转时间,提高了集群的运行性能.%Shuffle performance is an important indicator of affecting cluster pedormance for big data frameworks.The Shuffle memory allocation algorithm of Spark itself tries to allocate memory evenly for every Task in the memory pool,but it is found in experiments that the memory was wasted and the efficiency was low due to the imbalance of memory requirements for each Task.In order to solve the problem,an improved Spark Shuffle memory allocation algorithm was proposed.According to the amount of memory applications and historical operating data,the Task was divided into two categories based on memory requirements.The "split" processing was carried out for the Task of small memory requirements,while the memory was allocated based on the number of Task overflows and the waiting time after overflow for the Task of large memory requirements.By taking full advantage of the free memory of memory pool,the adaptive adjustment of Task memory allocation could be realized under the condition of unbalanced Task memory requirements caused by the data skew.The experimental results show that,compared with the original algorithm,the improved algorithm can reduce the overflow rate of the Task,decrease the turnaround time of the Task,and improve the running performance of the cluster.

著录项

来源
《计算机应用》 |2017年第12期|3401-34053429|共6页
作者
侯伟凡; 樊玮; 张宇翔;
展开▼
作者单位

中国民航大学计算机科学与技术学院;

天津300300;

中国民航大学计算机科学与技术学院;

天津300300;

中国民航大学计算机科学与技术学院;

天津300300;

展开▼
原文格式 PDF
正文语种 chi
中图分类软件工程;
关键词
Apache Spark; Shuffle; 自适应; 内存分配; 运行性能;

相似文献

中文文献
外文文献
专利

1. NV-Shuffle:基于非易失内存的Shuffle机制 [J] . 潘锋烽 ,熊劲 . 计算机研究与发展 . 2018,第002期
2. Hadoop MapReduce与Spark的Shuffle过程原理 [J] . 胡必波 ,彭平 ,李散散 . 信息技术与信息化 . 2021,第005期
3. 一种Spark集群下的shuffle优化机制 [J] . 熊安萍 ,夏玉冲 ,杨方方 . 计算机工程与应用 . 2018,第004期
4. 基于Spark内存算法的图书馆大数据文献服务方案研究 [J] . 王海萍 . 微型电脑应用 . 2021,第003期
5. Spark计算节点同构环境下Executor的内存分配优化模型 [J] . 朱蓉 . 进展:科学视界 . 2020,第001期
6. 内存计算框架Spark的数据失效恢复策略 [C] . Ying Changtian ,英昌甜 ,Yu Jiong . 2016年全国高性能计算学术年会 . 2016
7. 基于Spark的关键蛋白质预测算法及shuffle内存优化策略研究 [A] . 胡德祺 . 2019

改进的Spark Shuffle内存分配算法

摘要

著录项

相似文献

相关主题

期刊订阅