【24h】

Landing CG on EARTH

机译:在地球上降落CG

获取原文

摘要

We report on our work in developing a fine-grained multithreaded solution for the communication-intensive Conjugate Gradient (CG) problem. In our recent work, we have developed a simple, yet very efficient, solution to executing matrix-vector multiply on a multithreaded system. This paper presents an effective mechanism for the reduction-broadcast phase, which is implemented and integrated with the sparse MVM resulting in a scalable implementation of the complete CG application.

Three major observations from our experiments on the EARTH multithreaded testbed are: (1) The scalability of our CG implementation is impressive, e.g., speedup is 90 on 120 processors for the NAS CG class B input. (2) Our dataflow-style reduction-broadcast network based on fine-grain multithreading is twice as fast as a serial reduction scheme on the same system. (3)By slowing down the netwok by a factor of 2, no notable degradation of overall CG performance was observed.

机译:

我们报告了我们在开发用于通信密集型共轭梯度(CG)问题的细粒度多线程解决方案方面的工作。在最近的工作中,我们已经开发了一种简单但非常有效的解决方案,可以在多线程系统上执行矩阵向量乘法。本文提出了一种减少广播阶段的有效机制,该机制已实现并与稀疏MVM集成在一起,从而实现了完整CG应用程序的可扩展实现。

我们在EARTH多线程测试平台上进行的实验得出的三个主要结论是:(1)我们CG实现的可伸缩性令人印象深刻,例如,NAS CG B类输入的120个处理器的加速比达到了90。 (2)我们基于细粒度多线程的数据流式缩减广播网络的速度是同一系统上串行缩减方案的两倍。 (3)通过将网络速度降低2倍,未发现整体CG性能显着下降。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号