首页> 外文期刊>Concurrency, practice and experience >ImageCL: Language and source-to-source compiler for performance portability, load balancing, and scalability prediction on heterogeneous systems
【24h】

ImageCL: Language and source-to-source compiler for performance portability, load balancing, and scalability prediction on heterogeneous systems

机译:ImageCL:语言和源到源编译器,用于异构系统上的性能可移植性,负载平衡和可伸缩性预测

获取原文
获取原文并翻译 | 示例
           

摘要

Applications written for heterogeneous CPU-GPU systems often suffer from poor performance portability. Finding good work partitions can also be challenging as different devices are suited for different applications. This article describes ImageCL, a high-level domain-specific language and source-to-source compiler, targeting single system as well as distributed heterogeneous hardware. Initially targeting imageprocessing algorithms, our frameworknowalso handles general stencil-based operations. It resemblesOpenCL, but abstracts away performance optimization details which instead are handled by our source-to-source compiler. Machine learning-based auto-tuning is used to determine which optimizations to apply. For the distributed case, by measuring performance counters on a small inputononedevice,previously trainedperformance models areused to predict the throughput of the application onmultiple different devices, making it possible to balance the load evenly. Models for the communication overhead are created in a similar fashion and used to predict the optimal number of nodes to use. ImageCL outperforms other state-of-the-art solutions on image processing benchmarks in several cases, achieving speedups of up to 4.57×. On both CPUs and GPUs we are only 3% and 2% slower than an oracle for load balancing and scalability prediction, respectively, using synthetic benchmarks.
机译:为异构CPU-GPU系统编写的应用程序经常会遭受较差的性能可移植性。由于不同的设备适用于不同的应用程序,因此找到良好的工作分区也可能具有挑战性。本文介绍了ImageCL,它是一种针对特定领域的高级语言和源到源编译器,其目标是单个系统以及分布式异构硬件。最初针对图像处理算法,我们的框架现在还处理基于模板的常规操作。它类似于OpenCL,但是抽象了性能优化细节,这些细节由我们的源到源编译器处理。基于机器学习的自动调整用于确定要应用的优化。对于分布式情况,通过在一个较小的输入设备上测量性能计数器,可以使用先前训练有素的性能模型来预测多个不同设备上的应用程序吞吐量,从而可以均衡负载。以类似的方式创建用于通信开销的模型,并将其用于预测要使用的最佳节点数。在某些情况下,ImageCL在图像处理基准方面优于其他最新解决方案,可实现高达4.57倍的加速。在CPU和GPU上,使用综合基准,分别比负载均衡和可伸缩性预测的Oracle慢3%和2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号