首页> 外文会议>IEEE International Congress on Big Data >Spark-GPU: An accelerated in-memory data processing engine on clusters
【24h】

Spark-GPU: An accelerated in-memory data processing engine on clusters

机译:Spark-GPU:集群上的加速内存中数据处理引擎

获取原文

摘要

Apache Spark is an in-memory data processing system that supports both SQL queries and advanced analytics over large data sets. In this paper, we present our design and implementation of Spark-GPU that enables Spark to utilize GPU's massively parallel processing ability to achieve both high performance and high throughput. Spark-GPU transforms a general-purpose data processing system into a GPU-supported system by addressing several real-world technical challenges including minimizing internal and external data transfers, preparing a suitable data format and a batching mode for efficient GPU execution, and determining the suitability of workloads for GPU with a task scheduling capability between CPU and GPU. We have comprehensively evaluated Spark-GPU with a set of representative analytical workloads to show its effectiveness. Our results show that Spark-GPU improves the performance of machine learning workloads by up to 16.13x and the performance of SQL queries by up to 4.83x.
机译:Apache Spark是一个内存中数据处理系统,它支持SQL查询和对大数据集的高级分析。在本文中,我们介绍了Spark-GPU的设计和实现,该设计和实现使Spark可以利用GPU的大规模并行处理能力来实现高性能和高吞吐量。通过解决一些现实世界的技术难题,Spark-GPU将通用数据处理系统转换为GPU支持的系统,其中包括最大程度地减少内部和外部数据传输,准备合适的数据格式和批处理模式以有效执行GPU,以及确定具有CPU和GPU之间的任务调度功能的GPU工作负载的适用性。我们使用一组代表性的分析工作负载对Spark-GPU进行了全面评估,以显示其有效性。我们的结果表明,Spark-GPU将机器学习工作负载的性能提高了16.13倍,将SQL查询的性能提高了4.83倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号