Spark内存管理及缓存策略研究

孟红涛; 余松平; 刘芳; 肖侬

首页> 中文期刊> 《计算机科学》 >Spark内存管理及缓存策略研究

Spark内存管理及缓存策略研究

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Spark系统是基于Map-Reduce模型的大数据处理框架.Spark能够充分利用集群的内存,从而加快数据的处理速度.Spark按照功能把内存分成不同的区域:Shuffle Memory和Storage Memory,Unroll Memory,不同的区域有不同的使用特点.首先,测试并分析了Shuffle Memory和Storage Memory的使用特点.RDD是Spark系统最重要的抽象,能够缓存在集群的内存中;在内存不足时,需要淘汰部分RDD分区.接着,提出了一种新的RDD分布式权值缓存策略,通过RDD分区的存储时间、大小、使用次数等来分析RDD分区的权值,并根据RDD的分布式特征对需要淘汰的RDD分区进行选择.最后,测试和分析了多种缓存策略的性能.%Spark is a big data processing framework based on Map-Reduce.Spark can make full use of cluster memory,thus accelerating data processing.Spark divides memory into Shuffle Memory,Storage Memory and Unroll Memory according to their functions.These different memory zones have different characteristics.The features of Shuffle Memory and Storage Memory were tested and analyzed.RDD (Resilient Distributed Datasets) is the most important abstract in spark,which can cache in cluster memory.When the cluster memory is insufficient,Spark must select some RDD partitions to discard to make room for the new ones.A new cache replacement policies called DWRP (Distributed Weight Replacement Policy) was proposed.DWRP can compute the weight of every RDD partition based on the time of store in memory,size and frequency of use,and then select possible RDD partition to discard based on distribution features.The performance of different cache replacement policies was tested and analyzed at last.

著录项

来源
《计算机科学》 |2017年第6期|31-3574|共6页
作者
孟红涛; 余松平; 刘芳; 肖侬;
展开▼
作者单位

国防科学技术大学计算机学院长沙410072;

国防科学技术大学计算机学院长沙410072;

国防科学技术大学计算机学院长沙410072;

国防科学技术大学计算机学院长沙410072;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
大数据; Spark内存管理; RDD缓存; 缓存策略;

相似文献

中文文献
外文文献
专利

1. Spark中一种高效RDD自主缓存替换策略研究 [J] . 魏赟 ,丁宇琛 . 计算机应用研究 . 2020,第010期
2. Apache Spark内存管理 [J] . . 电脑编程技巧与维护 . 2017,第007期
3. 基于多级缓存的内存管理方案 [J] . 丁锐 ,张亚君 ,陈维 . 杭州电子科技大学学报 . 2011,第005期
4. 数据库结果集缓存的内存管理机制 [J] . 杨永亮 . 中国高新技术企业 . 2008,第011期
5. 并行计算框架Spark的自适应缓存管理策略 [J] . 卞琛 ,于炯 ,英昌甜 . 电子学报 . 2017,第002期
6. 基于多级缓存的内存管理方案 [C] . 丁锐 ,张亚君 ,陈维 . 浙江省电子学会2011学术年会 . 2011
7. Spark内存管理与缓存策略研究 [A] . 孟红涛 . 2016

Spark内存管理及缓存策略研究

摘要

著录项

相似文献

相关主题

期刊订阅