首页> 外文期刊>Concurrency and computation: practice and experience >A superlinear speedup region for matrix multiplication
【24h】

A superlinear speedup region for matrix multiplication

机译:用于矩阵乘法的超线性加速区域

获取原文
获取原文并翻译 | 示例

摘要

The realization of modern processors is based on a multicore architecture with increasing number of cores per processor. Multicore processors are often designed such that some level of the cache hierarchy is shared among cores. Usually, last level cache is shared among several or all cores (e.g., L3 cache) and each core possesses private low level caches (e.g., L1 and L2 caches). Superlinear speedup is possible for matrix multiplication algorithm executed in a shared memory multiprocessor due to the existence of a superlinear region. It is a region where cache requirements for matrix storage of the sequential execution incur more cache misses than in parallel execution. This paper shows theoretically and experimentally that there is a region, where the superlinear speedup can be achieved. We provide a theoretical proof of existence of a superlinear speedup and determine boundaries of the region where it can be achieved. The experiments confirm our theoretical results. Therefore, these results will have impact on future software development and exploitation of parallel hardware on the basis of a shared memory multiprocessor architecture. Copyright © 2013 John Wiley & Sons, Ltd.
机译:现代处理器的实现基于多核架构,其中每个处理器的核心数量不断增加。通常设计多核处理器,以便在内核之间共享某些级别的缓存层次结构。通常,最后一级高速缓存在几个或所有核心(例如,L3高速缓存)之间共享,并且每个核心拥有私有的低级高速缓存(例如,L1和L2高速缓存)。由于存在超线性区域,因此在共享内存多处理器中执行的矩阵乘法算法可以实现超线性加速。与并行执行相比,在该区域中对顺序执行的矩阵存储的缓存要求会导致更多的缓存未命中。本文在理论上和实验上表明,存在一个可以实现超线性加速的区域。我们提供了超线性加速的存在的理论证明,并确定了可以实现这一目标的区域的边界。实验证实了我们的理论结果。因此,这些结果将对基于共享内存多处理器体系结构的未来软件开发和并行硬件开发产生影响。版权所有©2013 John Wiley&Sons,Ltd。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号