Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture

Tan Guangming; Liu Junhong; Li Jiajia

首页> 外文期刊>ACM transactions on mathematical software >Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture

【24h】

Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture

机译：适用于多核和多核架构的自适应SpMV库的设计与实现

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Sparse matrix vector multiplication (SpMV) is an important computational kernel in traditional highperformance computing and emerging data-intensive applications. Previous SpMV libraries are optimized by either application-specific or architecture-specific approaches but present difficulties for use in real applications. In this work, we develop an auto-tuning system (SMATER) to bridge the gap between specific optimizations and general-purpose use. SMATER provides programmers a unified interface based on the compressed sparse row (CSR) sparse matrix format by implicitly choosing the best format and fastest implementation for any input sparse matrix during runtime. SMATER leverages a machine-learning model and retargetable back-end library to quickly predict the optimal combination. Performance parameters are extracted from 2,386 matrices in the SuiteSparse matrix collection. The experiments show that SMATER achieves good performance (up to 10 times that of the Intel Math Kernel Library (MKL) on Intel E5-2680 v3) while being portable on state-of-the-art x86 multicore processors, NVIDIA GPUs, and Intel Xeon Phi accelerators. Compared with the Intel MKL library, SMATER runs faster by more than 2.5 times on average. We further demonstrate its adaptivity in an algebraic multigrid solver from the Hypre library and report greater than 20% performance improvement.

机译：稀疏矩阵向量乘法（SpMV）是传统高性能计算和新兴数据密集型应用程序中的重要计算内核。以前的SpMV库通过特定于应用程序或特定于体系结构的方法进行了优化，但是在实际应用程序中存在困难。在这项工作中，我们开发了一种自动调整系统（SMATER），以弥合特定优化和通用用途之间的差距。 SMATER通过在运行时为任何输入稀疏矩阵隐式选择最佳格式和最快实现，为程序员提供了基于压缩稀疏行（CSR）稀疏矩阵格式的统一接口。 SMATER利用机器学习模型和可重定位的后端库来快速预测最佳组合。性能参数是从SuiteSparse矩阵集合中的2386个矩阵中提取的。实验表明，SMATER具有出色的性能（是Intel E5-2680 v3上Intel Math Kernel Library（MKL）的10倍），并且可以在最新的x86多核处理器，NVIDIA GPU和Intel上移植至强融核加速器。与Intel MKL库相比，SMATER运行速度平均快2.5倍以上。我们在Hypre库的代数多网格求解器中进一步证明了其适应性，并报告性能提高了20％以上。

著录项

来源
《ACM transactions on mathematical software》 |2018年第4期|46.1-46.25|共25页
作者
Tan Guangming; Liu Junhong; Li Jiajia;
展开▼
作者单位

Univ Chinese Acad Sci, Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China;

Univ Chinese Acad Sci, Chinese Acad Sci, Inst Comp Technol, State Key Lab Comp Architecture, Beijing, Peoples R China;

Georgia Inst Technol, Computat Sci & Engn, Atlanta, GA 30332 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Sparse matrix vector multiplication; auto-tuning; multicore; machine learning;

机译：稀疏矩阵向量乘法;自动调整;多核;机器学习;

相似文献

外文文献
中文文献
专利

1. A Spatial and Temporal Locality-Aware Adaptive Cache Design With Network Optimization for Tiled Many-Core Architectures [J] . Mingyu Wang, Zhaolin Li IEEE transactions on very large scale integration (VLSI) systems . 2017,第9期

机译：面向网络的平铺多核体系结构的时空局部性自适应缓存设计与网络优化
2. Design and Implementation of a Cache Hierarchy-Aware Task Scheduling for Parallel Loops on Multicore Architecture [J] . Nader Khammassi, Jean-Christophe Le Lann Computer Science & Information Technology . 2014,第2期

机译：多核体系结构上并行循环的缓存层次结构感知任务调度的设计和实现
3. The design and implementation of neuma, a collaborative digital scores library: requirements, architecture, and models [J] . P. Jouvelot Computing reviews . 2013,第3期

机译：neuma的设计和实现，这是一个协作式数字分数库：需求，体系结构和模型
4. A Study of SpMV Implementation Using MPI and OpenMP on Intel Many-Core Architecture [C] . Fan Ye, Christophe Calvin, Serge G. Petiton International conference on high performance computing for computational science . 2015

机译：使用MPI和OpenMP在英特尔多核体系结构上实现SpMV的研究
5. The lit room: The design, implementation and evaluation of a multi-media, architectural robotics-embedded installation in a public library for augmenting children's interactive picturebook read-alouds [D] . Schafer, George J. 2015

机译：照明室：在公共图书馆中嵌入多媒体，建筑机器人技术的装置的设计，实施和评估，以增加儿童交互式图画书的阅读量
6. High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures [O] . Daehyun Kim, Joshua Trzasko, Mikhail Smelyanskiy, 2011

机译：使用多核架构的高性能3D压缩传感MRI重建
7. Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library [O] . Matthes, Alexander, Widera, René, Zenker, Erik, 2017

机译：无需调整和优化各种多核架构使用alpaka库更改单行实现代码

Design and Implementation of Adaptive SpMV Library for Multicore and Many-Core Architecture

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅