首页> 外文会议>6th Workshop on Irregular Applications: Architecture and Algorithms >Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures
【24h】

Optimizing Sparse Tensor Times Matrix on Multi-core and Many-Core Architectures

机译:在多核和多核架构上优化稀疏张量时间矩阵

获取原文
获取原文并翻译 | 示例

摘要

This paper presents the optimized design and implementation of sparse tensor-times-dense matrix multiply (SpTTM) for CPU and GPU platforms. This primitive is a critical bottleneck in data analysis and mining applications based on tensor methods, such as the Tucker decomposition. We first design and implement sequential SpTTM to avoid explicit data transformations between a tensor and a matrix, which is the conventional approach. We further optimize SpTTM on multicore CPU and GPU systems by parallelizing, avoiding locks, and exploiting data locality. Our sequential SpTTM is up to 3.5× faster than the SpTTM from Tensor Toolbox and 1.5× over that from Cyclops Tensor Framework. Our parallel algorithms show 4.1× speedup on multicore Intel Core i7 and 18.8× speedup on NVIDIA K40c GPU over our sequential SpTTM respectively.
机译:本文介绍了针对CPU和GPU平台的稀疏张量-时间-密集矩阵乘法(SpTTM)的优化设计和实现。该原语是基于张量方法(例如Tucker分解)的数据分析和挖掘应用程序中的关键瓶颈。我们首先设计并实现顺序SpTTM,以避免张量和矩阵之间的显式数据转换,这是传统方法。通过并行化,避免锁定和利用数据局部性,我们进一步优化了多核CPU和GPU系统上的SpTTM。我们的顺序SpTTM比Tensor Toolbox的SpTTM快3.5倍,比Cyclops Tensor Framework快1.5倍。我们的并行算法分别显示了通过顺序SpTTM在多核Intel Core i7上加速4.1倍,在NVIDIA K40c GPU上加速18.8倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号