首页> 外文期刊>Future generation computer systems >Architectural investigation of matrix data layout on multicore processors
【24h】

Architectural investigation of matrix data layout on multicore processors

机译:多核处理器上矩阵数据布局的体系结构研究

获取原文
获取原文并翻译 | 示例

摘要

Many practical applications include matrix operations as essential procedures. In addition, recent studies of matrix operations rely on parallel processing to reduce any calculation delays. Because these operations are highly data intensive, many studies have investigated work distribution techniques and data access latency to accelerate algorithms. However, previous studies have not considered hardware architectural features adequately, although they greatly affect the performance of matrix operations. Thus, the present study considers the architectural characteristics that affect the performance of matrix operations on real multicore processors. We use matrix multiplication, LU decomposition, and Cholesky factorization as the test applications, which are well-known data-intensive mathematical algorithms in various fields. We argue that applications only access matrices in a particular direction, and we propose that the canonical data layout is the optimal matrix data layout compared with the block data layout. In addition, the tiling algorithm is utilized to increase the temporal data locality in multilevel caches and to balance the workload as evenly as possible in multicore environments. Our experimental results show that applications using the canonical data layout with tiling have an 8.23% faster execution time and 3.91% of last level cache miss rate compared with applications executed with the block data layout.
机译:许多实际应用包括将矩阵运算作为基本过程。另外,最近对矩阵运算的研究依靠并行处理来减少任何计算延迟。由于这些操作是高度数据密集型的,因此许多研究已经研究了工作分配技术和数据访问延迟以加速算法。但是,尽管先前的研究极大地影响了矩阵运算的性能,但先前的研究并未充分考虑其硬件体系结构特征。因此,本研究考虑了影响实际多核处理器上矩阵运算性能的体系结构特征。我们使用矩阵乘法,LU分解和Cholesky因式分解作为测试应用程序,它们是各个领域中众所周知的数据密集型数学算法。我们认为应用程序仅在特定方向上访问矩阵,并且我们建议规范数据布局与块数据布局相比是最佳矩阵数据布局。此外,利用切片算法可以提高多级缓存中的时间数据局部性,并在多核环境中尽可能平均地平衡工作负载。我们的实验结果表明,与使用块数据布局执行的应用程序相比,将规范数据布局与平铺一起使用的应用程序的执行时间缩短了8.23%,上一级缓存未命中率达到了3.91%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号