【24h】

Reproducible and Accurate Matrix Multiplication

机译:可重复和准确的矩阵乘法

获取原文

摘要

Due to non-associativity of floating-point operations and dynamic scheduling on parallel architectures, getting a bit-wise reproducible floating-point result for multiple executions of the same code on different or even similar parallel architectures is challenging. In this paper, we address the problem of reproducibility in the context of matrix multiplication and propose an algorithm that yields both reproducible and accurate results. This algorithm is composed of two main stages: a filtering stage that uses fast vectorized floating-point expansions in conjunction with error-free transformations; an accumulation stage based on Kulisch long accumulators in a high-radix carry-save representation. Finally, we provide implementations and performance results in parallel environments like GPUs.
机译:由于浮点操作的非关联性和并行架构上的动态调度,在不同甚至类似的并行架构上获取多个执行相同代码的比特可重复的浮点结果是具有挑战性的。在本文中,我们解决了矩阵乘法背景下的再现性问题,并提出了一种算法,其产生可再现和准确的结果。该算法由两个主要阶段组成:过滤阶段,使用快速矢量化浮点扩展结合无差错变换;基于Kulisch长累加器的高基数携带储存表示的累积阶段。最后,我们提供了GPU等并行环境的实现和性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号