A superlinear speedup region for matrix multiplication

Gusev Marjan; Ristov Sasko

首页> 外文期刊>Concurrency and computation: practice and experience >A superlinear speedup region for matrix multiplication

【24h】

A superlinear speedup region for matrix multiplication

机译：用于矩阵乘法的超线性加速区域

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The realization of modern processors is based on a multicore architecture with increasing number of cores per processor. Multicore processors are often designed such that some level of the cache hierarchy is shared among cores. Usually, last level cache is shared among several or all cores (e.g., L3 cache) and each core possesses private low level caches (e.g., L1 and L2 caches). Superlinear speedup is possible for matrix multiplication algorithm executed in a shared memory multiprocessor due to the existence of a superlinear region. It is a region where cache requirements for matrix storage of the sequential execution incur more cache misses than in parallel execution. This paper shows theoretically and experimentally that there is a region, where the superlinear speedup can be achieved. We provide a theoretical proof of existence of a superlinear speedup and determine boundaries of the region where it can be achieved. The experiments confirm our theoretical results. Therefore, these results will have impact on future software development and exploitation of parallel hardware on the basis of a shared memory multiprocessor architecture. Copyright © 2013 John Wiley & Sons, Ltd.

机译：现代处理器的实现基于多核架构，其中每个处理器的核心数量不断增加。通常设计多核处理器，以便在内核之间共享某些级别的缓存层次结构。通常，最后一级高速缓存在几个或所有核心（例如，L3高速缓存）之间共享，并且每个核心拥有私有的低级高速缓存（例如，L1和L2高速缓存）。由于存在超线性区域，因此在共享内存多处理器中执行的矩阵乘法算法可以实现超线性加速。与并行执行相比，在该区域中对顺序执行的矩阵存储的缓存要求会导致更多的缓存未命中。本文在理论上和实验上表明，存在一个可以实现超线性加速的区域。我们提供了超线性加速的存在的理论证明，并确定了可以实现这一目标的区域的边界。实验证实了我们的理论结果。因此，这些结果将对基于共享内存多处理器体系结构的未来软件开发和并行硬件开发产生影响。版权所有©2013 John Wiley＆Sons，Ltd。

著录项

来源
《Concurrency and computation: practice and experience》 |2014年第11期|1847-1868|共22页
作者
Gusev Marjan; Ristov Sasko;
展开▼
作者单位

Faculty of Computer Sciences and Engineering Ss. Cyril and Methodius University Skopje Macedonia;

Faculty of Computer Sciences and Engineering Ss. Cyril and Methodius University Skopje Macedonia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Gustafson's law; Amdahl's law; shared memory multiprocessor; cache memory; high performance computing;

机译：古斯塔夫定律;阿姆达尔定律;共享内存多处理器;高速缓存;高性能计算;

相似文献

外文文献
中文文献
专利

1. Unconventional Wisdom: Superlinear Speedup and Inherently Parallel Computations [J] . Akl Selim G. International journal of unconventional computing . 2018,第4a5期

机译：非常规智慧：超线性加速和固有并行计算
2. Superlinear speedup phenomenon in parallel 3D Discrete Element Method (DEM) simulations of complex-shaped particles [J] . Yan Beichuan, Regueiro Richard A. Parallel Computing . 2018,第JULa期

机译：复杂形状粒子的并行3D离散元方法（DEM）模拟中的超线性加速现象
3. Comparison between O(n~2) and O(n) neighbor search algorithm and its influence on superlinear speedup in parallel discrete element method (DEM) for complex-shaped particles [J] . Yan Beichuan, Regueiro Richard Engineering Computations . 2018,第6期

机译：O（n〜2）与O（n）邻居搜索算法的比较及其对复杂形状粒子的并行离散元方法（DEM）中超线性加速的影响
4. Superlinear speedup for matrix multiplication [C] . Ristov Sasko, Gusev Marjan Proceedings of the ITI 2012 34th International Conference on Information Technology Interfaces . 2012

机译：矩阵乘法的超线性加速
5. Optimizing Tall-and-skinny Matrix-matrix Multiplication on GPUs [D] . Xiong, Nan 2018

机译：在GPU上优化高而瘦的矩阵矩阵乘法
6. HIERARCHICAL ORTHOGONAL MATRIX GENERATION AND MATRIX-VECTOR MULTIPLICATIONS IN RIGID BODY SIMULATIONS [O] . FUHUI FANG, JINGFANG HUANG, GARY HUBER, -1

机译：刚体模拟中的正交正交矩阵生成和矩阵向量乘法
7. Superlinear Speedup in HPC Systems: why and when? [O] . Sasko Ristov, Radu Prodan, Marjan Gusev, 2016

机译：HPC系统中超级线性加速：为什么何时？

A superlinear speedup region for matrix multiplication

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅