首页> 外文会议>International Conference on High Performance Computing, Data, and Analytics >GPU-FPtuner: Mixed-precision Auto-tuning for Floating-point Applications on GPU
【24h】

GPU-FPtuner: Mixed-precision Auto-tuning for Floating-point Applications on GPU

机译:GPU-FPTUNER:GPU上的浮点应用混合精密自动调整

获取原文

摘要

GPUs have been extensively used to accelerate scientific applications from a variety of domains: computational fluid dynamics, astronomy and astrophysics, climate modeling, numerical analysis, to name a few. Many of these applications rely on floating-point arithmetic, which is approximate in nature. High-precision libraries have been proposed to mitigate accuracy issues due to the use of floating-point arithmetic. However, these libraries offer increased accuracy at a significant performance cost. Previous work, primarily focusing on CPU code and on standard IEEE floating-point data types, has explored mixed precision as a compromise between performance and accuracy. In this work, we propose a mixed precision autotuner for GPU applications that rely on floating-point arithmetic. Our tool supports standard 32- and 64-bit floating-point arithmetic, as well as high precision through the QD library. Our autotuner relies on compiler analysis to reduce the size of the tuning space. In particular, our tuning strategy takes into account code patterns prone to error propagation and GPU-specific considerations to generate a tuning plan that balances performance and accuracy. Our autotuner pipeline, implemented using the ROSE compiler and Python scripts, is fully automated and the code is available in open source. Our experimental results collected on benchmark applications with various code complexities show performance-accuracy tradeoffs for these applications and the effectiveness of our tool in identifying representative tuning points.
机译:GPU广泛用于加速来自各个领域的科学应用:计算流体动力学,天文学和天体物理学,气候建模,数值分析,命名几个。许多这些应用程序依赖于浮点算术,这在性质上是近似的。已经提出了高精度库以减轻由于使用浮点算术而减轻准确性问题。但是,这些库以显着的性能成本提供更高的准确性。以前的工作,主要关注CPU代码和标准IEEE浮点数据类型,在性能和准确性之间探讨了混合精度作为折衷。在这项工作中,我们为依赖于浮点算术的GPU应用提出了一种混合精密自动箱。我们的工具支持标准的32和64位浮点算术,以及通过QD库的高精度。我们的AutoTuner依赖于编译器分析来减少调谐空间的大小。特别是,我们的调整策略考虑了易于错误传播和GPU特定考虑的代码模式,以生成平衡性能和准确性的调整计划。我们使用玫瑰编译器和Python脚本实现的AutoTuner管道完全自动化,并且在开源中提供代码。我们的实验结果采集了具有各种代码复杂性的基准应用程序,显示了这些应用程序的性能准确性权衡以及我们工具在识别代表调谐点时的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号