An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

机译：一种在GPU上求解大型对角线系统的自动调整方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a multi-stage method for solving large tridiagonal systems on the GPU. Previously large tridiagonal systems cannot be efficiently solved due to the limitation of on-chip shared memory size. We tackle this problem by splitting the systems into smaller ones and then solving them on-chip. The multi-stage characteristic of our method, together with various workloads and GPUs of different capabilities, obligates an auto-tuning strategy to carefully select the switch points between computation stages. In particular, we show two ways to effectively prune the tuning space and thus avoid an impractical exhaustive search: (1) apply algorithmic knowledge to decouple tuning parameters, and (2) estimate search starting points based on GPU architecture parameters. We demonstrate that auto-tuning is a powerful tool that improves the performance by up to 5x, saves 17% and 32% of execution time on average respectively over static and dynamic tuning, and enables our multi-stage solver to outperform the Intel MKL tridiagonal solver on many parallel tridiagonal systems by 6-11x.

机译：我们提出了一种用于解决GPU上的大型对角线系统的多阶段方法。由于片上共享存储器大小的限制，以前的大型对角线系统无法有效解决。我们通过将系统拆分为较小的系统，然后在芯片上解决它们来解决此问题。我们方法的多阶段特性，再加上各种工作负载和具有不同功能的GPU，必须采用一种自动调整策略来仔细选择计算阶段之间的切换点。特别是，我们展示了两种有效修剪调优空间并避免不切实际的穷举搜索的方法：（1）应用算法知识解耦调优参数，以及（2）根据GPU架构参数估计搜索起点。我们证明了自动调整功能强大，可将性能提高多达5倍，与静态和动态调整相比，平均分别节省了17％和32％的执行时间，并使我们的多级求解器的性能优于Intel MKL三对角线在许多平行三对角线系统上使用6-11x解算器。

著录项

来源
《2011 25th IEEE International Parallel Distributed Processing Symposium》|2011年|p.956-965|共10页
会议地点
作者
Davidson Andrew; Zhang Yao; Owens John D.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.133;
关键词

相似文献

外文文献
中文文献
专利

1. On solving separable block tridiagonal linear systems using a GPU implementation of radix-4 PSCR method [J] . M. Myllykoski, T. Rossi, J. Toivanen Journal of Parallel and Distributed Computing . 2018,第MAY期

机译：使用radix-4 PSCR方法的GPU实现求解可分块三对角线性系统
2. A parallel solving method for block-tridiagonal equations on CPU-GPU heterogeneous computing systems [J] . Yang Wangdong, Li Kenli, Li Keqin Journal of supercomputing . 2017,第5期

机译：CPU-GPU异构计算系统中块三对角方程的并行求解方法
3. Implementing and Evaluating an Heterogeneous, Scalable, Tridiagonal Linear System Solver with OpenCL to Target FPGAs, GPUs, and CPUs [J] . Macintosh Hamish J., Banks Jasmine E., Kelson Neil A. International journal of reconfigurable computing . 2019,第PTa1期

机译：实现和评估具有OpenCL的异构，可伸缩的Tridgonal线性系统求解器，以靶向FPGA，GPU和CPU
4. An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU [C] . Davidson Andrew, Zhang Yao, Owens John D. IEEE International Parallel Distributed Processing Symposium . 2011

机译：一种解决GPU上大型三角形系统的自动调谐方法
5. Accelerating the discontinuous Galerkin cell-vertex scheme (DG-CVS) solver on CPU-GPU heterogeneous systems. [D] . Hu, Xiaoqi. 2017

机译：在CPU-GPU异构系统上加速不连续Galerkin单元顶点方案（DG-CVS）求解器。
6. Modern architectures for intelligent systems: reusable ontologies and problem-solving methods. [O] . M. A. Musen 1998

机译：智能系统的现代体系结构：可重用的本体和解决问题的方法。
7. An auto-tuned method for solving large tridiagonal systems on the GPU [O] . Andrew Davidson, Yao Zhang, John D. Owens 2011

机译：一种自动调整的方法，用于在GpU上解决大型三对角系统

An Auto-tuned Method for Solving Large Tridiagonal Systems on the GPU

摘要

著录项

相似文献

相关主题

期刊订阅