首页> 外文期刊>Scientific programming >Programming The Linpack Benchmark For The Ibm Powerxcell 8i Processor
【24h】

Programming The Linpack Benchmark For The Ibm Powerxcell 8i Processor

机译:为Ibm Powerxcell 8i处理器编程Linpack基准测试

获取原文
获取原文并翻译 | 示例

摘要

In this paper we present the design and implementation of the Linpack benchmark for the IBM BladeCenter QS22, which incorporates two IBM PowerXCell 8i~1 processors. The PowerXCell 8i is a new implementation of the Cell Broadband Engine~(TM~2) architecture and contains a set of special-purpose processing cores known as Synergistic Processing Elements (SPEs). The SPEs can be used as computational accelerators to augment the main PowerPC processor. The added computational capability of the SPEs results in a peak double precision floating point capability of 108.8 GFLOPS. We explain how we modified the standard open source implementation of Linpack to accelerate key computational kernels using the SPEs of the PowerXCell 8i processors. We describe in detail the implementation and performance of the computational kernels and also explain how we employed the SPEs for high-speed data movement and reformatting. The result of these modifications is a Linpack benchmark optimized for the IBM PowerXCell 8i processor that achieves 170.7 GFLOPS on a BladeCenter QS22 with 32 GB of DDR2 SDRAM memory. Our implementation of Linpack also supports clusters of QS22s, and was used to achieve a result of 11.1 TFLOPS on a cluster of 84 QS22 blades. We compare our results on a single BladeCenter QS22 with the base Linpack implementation without SPE acceleration to illustrate the benefits of our optimizations.
机译:在本文中,我们介绍了针对IBM BladeCenter QS22的Linpack基准的设计和实现,该基准包含两个IBM PowerXCell 8i〜1处理器。 PowerXCell 8i是Cell Broadband Engine〜(TM〜2)架构的新实现,并包含一组专用的处理核心,称为协同处理元件(SPE)。 SPE可用作计算加速器以增强主PowerPC处理器。 SPE的附加计算能力导致了峰值双精度浮点能力为108.8 GFLOPS。我们将说明如何修改Powerpack的标准开源实现以使用PowerXCell 8i处理器的SPE加速关键计算内核。我们详细描述了计算内核的实现和性能,还解释了我们如何将SPE用于高速数据移动和重新格式化。这些修改的结果是针对IBM PowerXCell 8i处理器优化的Linpack基准测试,该基准在具有32 GB DDR2 SDRAM内存的BladeCenter QS22上达到了170.7 GFLOPS。我们的Linpack实施还支持QS22集群,并用于在84个QS22刀片集群上实现11.1 TFLOPS的结果。我们将在单一BladeCenter QS22上的结果与基本的Linpack实施(不带SPE加速)进行比较,以说明优化的好处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号