首页> 外文会议>Proceedings of the 2013 ACM SIGPLAN conference on programming language design and implementation >SMAT: An Input Adaptive Auto-Tuner for Sparse Matrix-Vector Multiplication
【24h】

SMAT: An Input Adaptive Auto-Tuner for Sparse Matrix-Vector Multiplication

机译:SMAT:用于稀疏矩阵矢量乘法的输入自适应自动调谐器

获取原文
获取原文并翻译 | 示例

摘要

Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far. SpMV libraries are optimized by either application-specific or architecture-specific approaches, making the libraries become too complicated to be used extensively in real applications. In this work we develop a Sparse Matrix-vector multiplication Auto-Tuning system (SMAT) to bridge the gap between specific optimizations and general-purpose usage. S-MAT provides users with a unified programming interface in compressed sparse row (CSR) format and automatically determines the optimal format and implementation for any input sparse matrix at runtime. For this purpose, SMAT leverages a learning model, which is generated in an off-line stage by a machine learning method with a training set of more than 2000 matrices from the UF sparse matrix collection, to quickly predict the best combination of the matrix feature parameters. Our experiments show that SMAT achieves impressive performance of up to 51GFLOPS in single-precision and 37GFLOPS in double-precision on mainstream x86 multi-core processors, which are both more than 3 times faster than the Intel MKL library. We also demonstrate its adaptability in an algebraic multi-grid solver from Hypre library with above 20% performance improvement reported.
机译:稀疏矩阵矢量乘法(SpMV)是传统高性能计算和新兴数据密集型应用程序中的重要内核。到目前为止。 SpMV库通过特定于应用程序或特定于体系结构的方法进行了优化,从而使该库变得过于复杂而无法在实际应用中广泛使用。在这项工作中,我们开发了稀疏矩阵向量乘法自动调整系统(SMAT),以弥合特定优化和通用用途之间的差距。 S-MAT为用户提供压缩稀疏行(CSR)格式的统一编程接口,并在运行时自动确定任何输入稀疏矩阵的最佳格式和实现。为此,SMAT利用了一种学习模型,该模型是通过机器学习方法在离线阶段生成的,该模型具有来自UF稀疏矩阵集合的2000多个矩阵的训练集,可以快速预测矩阵特征的最佳组合参数。我们的实验表明,在主流x86多核处理器上,SMAT在单精度上实现了高达51GFLOPS的出色性能,在双精度上实现了37GFLOPS的出色性能,两者均比Intel MKL库快3倍以上。我们还证明了它在Hypre库的代数多网格求解器中的适应性,据报道性能提高了20%以上。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号