首页> 外文学位 >NUMA-aware multicore Matrix Multiplication.
【24h】

NUMA-aware multicore Matrix Multiplication.

机译:NUMA感知的多核矩阵乘法。

获取原文
获取原文并翻译 | 示例

摘要

The emerging Symmetric Multi-Processing (SMP) architectures have shifted from the Shared- Bus topology to Cache-coherent Non-Uniform Memory Access (ccNUMA), where the Pro- cessing Elements (PEs) can access the distributed memories with different delays. This shift potentially impacts the performance of current SMP software packages and the way they address complexity of the new architecture. In this work, we compare our User-level thread scheduling mechanism [16] with OpenMP scheduler to multiply two large matrices on a Dual-socket NUMA architecture. We analyzed and evaluated an optimized and multi- threaded implementation of Level-3 BLAS general matrix multiplication routine (DGEMM). We have shown that addressing the architectural awareness for such a memory intensive op- eration would minimize the memory bottlenecks and improve the utilization of the memory caches and consequently, the overall performance. In this work, we show one way of threads scheduling and data alignment that can reduce the number of cache-misses down to one-third of the cache-misses from the non-tuned implementation and reduce the required computation time by up to 22%. Finally, we show the relationship between the number of cache-misses and the gained speedup percentage of our implementation, which proves our hypothesis about the data locality problem and memory bottleneck in a non NUMA-aware implementation.
机译:新兴的对称多处理(SMP)架构已从共享总线拓扑结构转变为高速缓存一致性非均匀内存访问(ccNUMA),其中处理元素(PE)可以以不同的延迟访问分布式内存。这种转变可能会影响当前SMP软件包的性能以及它们解决新体系结构复杂性的方式。在这项工作中,我们将用户级线程调度机制[16]与OpenMP调度程序进行了比较,以在双路NUMA架构上将两个大型矩阵相乘。我们分析和评估了Level-3 BLAS通用矩阵乘法例程(DGEMM)的优化和多线程实现。我们已经表明,解决这种内存密集型操作的体系结构意识可以最大程度地减少内存瓶颈,并提高内存缓存的利用率,从而提高整体性能。在这项工作中,我们展示了一种线程调度和数据对齐的方法,该方法可以将高速缓存未命中的次数减少到非调整实现的高速缓存未命中的三分之一,并将所需的计算时间减少多达22% 。最后,我们展示了高速缓存未命中次数与实现中获得的加速百分比之间的关系,这证明了我们关于非NUMA感知实现中的数据局部性问题和内存瓶颈的假设。

著录项

  • 作者

    Alkowaileet, Wail Yousef.;

  • 作者单位

    University of California, Irvine.;

  • 授予单位 University of California, Irvine.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2013
  • 页码 51 p.
  • 总页数 51
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号