Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures

机译：在多核和多核架构上优化稀疏张量时间矩阵

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper presents the optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms. This primitive is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition. We first design and implement sequential SpTTM to avoid explicit data transformations between a tensor and a matrix, which is the conventional approach. We further optimize SpTTM on multicore CPU and GPU systems by parallelizing, avoiding locks, and exploiting data locality. Our sequential SpTTM is up to 3.5× faster than the SpTTM from Tensor Toolbox and 1.5× over that from Cyclops Tensor Framework. Our parallel algorithms show 4.1× speedup on multicore Intel Core i7 and 18.8× speedup on NVIDIA K40c GPU over our sequential SpTTM respectively.

机译：本文介绍了针对CPU和GPU平台的稀疏张量-时间-密集矩阵乘法（SpTTM）的优化设计和实现。该原语是基于张量方法（例如Tucker分解）的数据分析和挖掘应用程序中的关键瓶颈。我们首先设计并实现顺序SpTTM，以避免张量和矩阵之间的显式数据转换，这是传统方法。通过并行化，避免锁定和利用数据局部性，我们进一步优化了多核CPU和GPU系统上的SpTTM。我们的顺序SpTTM比Tensor Toolbox的SpTTM快3.5倍，比Cyclops Tensor Framework快1.5倍。我们的并行算法分别显示了通过顺序SpTTM在多核Intel Core i7上加速4.1倍，在NVIDIA K40c GPU上加速18.8倍。

著录项

来源
《6th Workshop on Irregular Applications: Architecture and Algorithms》|2016年|26-33|共8页
会议地点 Salt Lake City(US)
作者
Jiajia Li; Yuchen Ma; Chenggang Yan; Richard Vuduc;
展开▼
作者单位

Comput. Sci. Eng., Georgia Inst. of Technol., Atlanta, GA, USA;

Inst. of Inf. Control, Hangzhou Dianzi Univ., Hangzhou, China;

Inst. of Inf. Control, Hangzhou Dianzi Univ., Hangzhou, China;

Inst. of Inf. Control, Hangzhou Dianzi Univ., Hangzhou, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Tensile stress; Sparse matrices; Matrix decomposition; Indexes; Multicore processing; Graphics processing units; Data analysis;

机译：拉伸应力稀疏矩阵矩阵分解索引多核处理图形处理单元数据分析;

相似文献

外文文献
中文文献
专利

1. Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors [J] . Nagasaka Yusuke, Matsuoka Satoshi, Azad Ariful, Parallel Computing . 2019,第Deca期

机译：对多核和多核处理器上的稀疏矩阵产品进行性能优化，建模和分析
2. Optimizing Sparse Matrix-Vector Multiplications on an ARMv8-based Many-Core Architecture [J] . Chen Donglin, Fang Jianbin, Chen Shizhao, International journal of parallel programming . 2019,第3期

机译：在基于ARMv8的多核体系结构上优化稀疏矩阵向量乘法
3. Solving Matrix Equations on Multi-Core and Many-Core Architectures [J] . Alfredo Rem#xF3, n, Enrique S. Quintana-Ort#xED, Algorithms . 2013,第4期

机译：在多核和多核架构上求解矩阵方程
4. Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures [C] . Jiajia Li, Yuchen Ma, Chenggang Yan, Workshop on Irregular Applications: Architecture and Algorithms . 2016

机译：优化多核和多核架构上的稀疏张量矩阵
5. Compile-time and run-time optimizations for enhancing locality and parallelism on multi-core and many-core systems. [D] . Baskaran, Muthu Manikandan. 2009

机译：编译时和运行时优化，用于增强多核和多核系统上的局部性和并行性。
6. Design and Development of a Run-Time Monitor for Multi-Core Architectures in Cloud Computing [O] . Mikyung Kang, Dong-In Kang, Stephen P. Crago, 2011

机译：云计算中多核架构运行时监视器的设计和开发
7. Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors [O] . Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, 2019

机译：多核和多核处理器稀疏矩阵矩阵产品的性能优化，建模和分析

Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅