首页> 外文会议>IEEE International Conference on Cloud Computing Technology and Science >VOLUME: Enable Large-Scale In-Memory Computation on Commodity Clusters
【24h】

VOLUME: Enable Large-Scale In-Memory Computation on Commodity Clusters

机译:卷:在商品集群上启用大规模的内存计算

获取原文

摘要

Traditional cloud computing technologies, such as MapReduce, use file systems as the system-wide substrate for data storage and sharing. A distributed file system provides a global name space and stores data persistently, but it also introduces significant overhead. Several recent systems use DRAM to store data and tremendously improve the performance of cloud computing systems. However, both our own experience and related work indicate that a simple substitution of distributed DRAM for the file system does not provide a solid and viable foundation for data storage and processing in the data center environment, and the capacity of such systems is limited by the amount of physical memory in the cluster. To overcome the challenge, we construct VOLUME (Virtual On-Line Unified Memory Environment), a distributed virtual memory to unify the physical memory and disk resources on many compute nodes, to form a system-wide data substrate. The new substrate provides a general memory based abstraction, takes advantage of DRAM in the system to accelerate computation, and, transparent to programmers, scales the system to handle large datasets by swapping data to disks and remote servers. The evaluation results show that VOLUME is much faster than Hadoop/HDFS, and delivers 6-11x speedups on the adjacency list workload. VOLUME is faster than both Hadoop/HDFS and Spark/RDD for in-memory sorting. For kmeans clustering, VOLUME scales linearly to 160 compute nodes on the TH-1/GZ supercomputer.
机译:传统的云计算技术,如mapreduce,使用文件系统作为系统存储和共享的系统范围的基板。分布式文件系统提供全球名称空间并持久地存储数据,但它也引入了显着的开销。最近的几个系统使用DRAM存储数据并大大提高云计算系统的性能。但是,我们自己的经验和相关工作都表明,对于文件系统的分布式DRAM的简单替代,没有为数据中心环境中的数据存储和处理提供了一个坚实的和可行的基础,并且这些系统的容量受到限制集群中的物理内存量。为了克服挑战,我们构建卷(虚拟在线统一内存环境),分布式虚拟内存,以统一许多计算节点上的物理内存和磁盘资源,以形成系统宽的数据基板。新的基板提供了一种基于一般的内存的抽象,利用了系统中的DRAM来加速计算,并且对程序员透明,透明地通过将数据交换到磁盘和远程服务器来处理大型数据集。评估结果表明,体积比Hadoop / HDFS快得多,并在邻接列表工作负载上提供6-11x的加速。卷比Hadoop / HDFS和Spark / RDD都速度更快。对于kmeans聚类,卷在TH-1 / GZ超级计算机上线性地缩放到160个计算节点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号