首页> 外文会议>Facing the multicore-challenge II: Aspects of new paradigms and technologies in parallel computing >Towards High-Performance Implementations of a Custom HPC Kernel Using Intel® Array Building Blocks
【24h】

Towards High-Performance Implementations of a Custom HPC Kernel Using Intel® Array Building Blocks

机译:使用英特尔®阵列构件块实现自定义HPC内核的高性能实现

获取原文
获取原文并翻译 | 示例

摘要

Today's highly parallel machines drive a new demand for parallel programming. Fixed power envelopes, increasing problem sizes, and new algorithms pose challenging targets for developers. HPC applications must leverage SIMD units, multi-core architectures, and heterogeneous computing platforms for optimal performance. This leads to low-level, non-portable code that is difficult to write and maintain. With Intel® Array Building Blocks (Intel ArBB), programmers focus on the high-level algorithms and rely on an automatic parallelization and vectorization with strong safety guarantees. Intel ArBB hides vendor-specific hardware knowledge by runtime just-in-time (JIT) compilation. This case study on data mining with adaptive sparse grids unveils how deterministic parallelism, safety, and runtime optimization make Intel ArBB practically applicable. Hand-tuned code is about 40% faster than ArBB, but needs about 8x more code. ArBB clearly outperforms standard semi-automatically parallelized G/G++ code by approximately 6x.
机译:当今高度并行的机器推动了对并行编程的新需求。固定的功率包络,不断增加的问题大小以及新算法为开发人员提出了具有挑战性的目标。 HPC应用程序必须利用SIMD单元,多核体系结构和异构计算平台来获得最佳性能。这导致难以编写和维护的低级,不可移植的代码。借助英特尔®阵列构建模块(Intel ArBB),程序员可以专注于高级算法,并依靠具有强大安全性保证的自动并行化和向量化。英特尔ArBB通过运行时实时(JIT)编译隐藏了特定于供应商的硬件知识。本案例研究采用自适应稀疏网格进行数据挖掘,揭示了确定性并行性,安全性和运行时优化如何使Intel ArBB切实可行。手动调整的代码比ArBB快40%,但需要的代码多8倍。 ArBB明显优于标准的半自动并行G / G ++代码约6倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号