首页> 外文期刊>Concurrency and computation: practice and experience >Performance analysis of the Kahan-enhanced scalar product onrncurrent multi-core and many-core processors
【24h】

Performance analysis of the Kahan-enhanced scalar product onrncurrent multi-core and many-core processors

机译:Kahan增强的标量产品在当前多核和多核处理器上的性能分析

获取原文
获取原文并翻译 | 示例

摘要

We investigate the performance characteristics of a numerically enhanced scalar product (dot) kernel looprnthat uses the Kahan algorithm to compensate for numerical errors, and describe efficient single instructionrnmultiple data-vectorized implementations on recent multi-core and many-core processors. Using low-levelrninstruction analysis and the execution-cache-memory performance model, we pinpoint the relevant performancernbottlenecks for single-core and thread-parallel execution and predict performance and saturationrnbehavior. We show that the Kahan-enhanced scalar product comes at almost no additional cost comparedrnwith the naive (non-Kahan) scalar product if appropriate low-level optimizations, notably single instructionrnmultiple data vectorization and unrolling, are applied. The execution-cache-memory model is extendedrnappropriately to accommodate not only modern Intel multicore chips but also the Intel Xeon Phi ‘KnightsrnCorner’ coprocessor and an IBM POWER8 CPU. This allows us to discuss the impact of processor featuresrnon the performance across four modern architectures that are relevant for high performance computing.
机译:我们研究了使用Kahan算法补偿数值错误的数字增强标量积(点)内核循环的性能特征,并描述了在最近的多核和多核处理器上有效的单指令多数据矢量化实现。使用低级指令分析和执行缓存内存性能模型,我们可以为单核和线程并行执行查明相关的性能瓶颈,并预测性能和饱和行为。我们显示,如果应用了适当的低级优化(尤其是单指令多数据矢量化和展开),则与朴素(非Kahan)标量产品相比,Kahan增强的标量产品几乎不会增加任何成本。适当地扩展了执行缓存内存模型,以适应现代Intel多核芯片,还可以容纳Intel Xeon Phi“ KnightsrnCorner”协处理器和IBM POWER8 CPU。这使我们能够讨论处理器功能的影响,以及与高性能计算相关的四种现代体系结构的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号