Spark-GPU: An accelerated in-memory data processing engine on clusters

机译：Spark-GPU：集群上的加速内存中数据处理引擎

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Apache Spark is an in-memory data processing system that supports both SQL queries and advanced analytics over large data sets. In this paper, we present our design and implementation of Spark-GPU that enables Spark to utilize GPU's massively parallel processing ability to achieve both high performance and high throughput. Spark-GPU transforms a general-purpose data processing system into a GPU-supported system by addressing several real-world technical challenges including minimizing internal and external data transfers, preparing a suitable data format and a batching mode for efficient GPU execution, and determining the suitability of workloads for GPU with a task scheduling capability between CPU and GPU. We have comprehensively evaluated Spark-GPU with a set of representative analytical workloads to show its effectiveness. Our results show that Spark-GPU improves the performance of machine learning workloads by up to 16.13x and the performance of SQL queries by up to 4.83x.

机译：Apache Spark是一个内存中数据处理系统，它支持SQL查询和对大数据集的高级分析。在本文中，我们介绍了Spark-GPU的设计和实现，该设计和实现使Spark可以利用GPU的大规模并行处理能力来实现高性能和高吞吐量。通过解决一些现实世界的技术难题，Spark-GPU将通用数据处理系统转换为GPU支持的系统，其中包括最大程度地减少内部和外部数据传输，准备合适的数据格式和批处理模式以有效执行GPU，以及确定具有CPU和GPU之间的任务调度功能的GPU工作负载的适用性。我们使用一组代表性的分析工作负载对Spark-GPU进行了全面评估，以显示其有效性。我们的结果表明，Spark-GPU将机器学习工作负载的性能提高了16.13倍，将SQL查询的性能提高了4.83倍。

著录项

来源
《IEEE International Congress on Big Data》|2016年|273-283|共11页
会议地点
作者
Yuan Yuan; Meisam Fathi Salmi; Yin Huai; Kaibo Wang; Rubao Lee; Xiaodong Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Graphics processing units; Sparks; Data analysis; Java; Parallel processing; Computational modeling;

机译：图形处理单元;火花;数据分析; Java;并行处理;计算建模;

相似文献

外文文献
中文文献
专利

1. Mille Cheval: a GPU-based in-memory high-performance computing framework for accelerated processing of big-data streams [J] . Kumar Vivek, Sharma Dilip Kumar, Mishra Vinay Kumar Journal of supercomputing . 2021,第7期

机译：Mille Cheval：基于GPU的内存高性能计算框架，用于加速处理大数据流
2. HAMR: A dataflow-based real-time in-memory cluster computing engine [J] . Wu Yao, Zheng Long, Heilig Brian, Experimental Mechanics . 2017,第5期

机译：HAMR：基于数据流的实时内存集群计算引擎
3. Accelerating in-memory transaction processing using general purpose graphics processing units [J] . Gao Lan, Xu Yunlong, Wang Rui, Future generation computer systems . 2019,第AUGa期

机译：使用通用图形处理单元加速内存中事务处理
4. Spark-GPU: An accelerated in-memory data processing engine on clusters [C] . Yuan Yuan, Meisam Fathi Salmi, Yin Huai, IEEE International Congress on Big Data . 2016

机译：Spark-GPU：集群上的加速内存数据处理引擎
5. Distributed RDF Storage and Querying Using In-Memory Processing Engine [D] . Hassan, P. M. Mahmudul. 2021

机译：使用内存处理引擎分布式RDF存储和查询
6. CLUSTOM-CLOUD: In-Memory Data Grid-Based Software for Clustering 16S rRNA Sequence Data in the Cloud Environment [O] . Jeongsu Oh, Chi-Hwan Choi, Min-Kyu Park, -1

机译：CLUSTOM-CLOUD：基于内存数据网格的软件用于在云环境中对16S rRNA序列数据进行聚类
7. Approach to Accelerating Dissolved Vector Buffer Generation in Distributed In-Memory Cluster Architecture [O] . Jinxin Shen, Luo Chen, Ye Wu, 2018

机译：加速分布式内存中集群架构中溶解向量缓冲生成的方法
8. Processing of Engine Data for Development of Turbojet Engine Analyzer. Part I. Processing of J79-7 Accelerated Service and J79-7a Engine Durability Test Data [R] . Douglass, M. E. 1964

机译：用于涡轮喷气发动机分析仪开发的发动机数据处理。第一部分J79-7加速服务和J79-7a发动机耐久性试验数据的处理

Spark-GPU: An accelerated in-memory data processing engine on clusters

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅