【24h】

The Pochoir Stencil Compiler

机译:Pochoir模具编译器

获取原文
获取原文并翻译 | 示例

摘要

A stencil computation repeatedly updates each point of a d-dimensional grid as a function of itself and its near neighbors. Parallel cache-efficient stencil algorithms based on "trapezoidal decompositions" are known, but most programmers find them difficult to write. The Pochoir stencil compiler allows a programmer to write a simple specification of a stencil in a domain-specific stencil language embedded in C++ which the Pochoir compiler then translates into high-performing Cilk code that employs an efficient parallel cache-oblivious algorithm. Pochoir supports general d-dimensional stencils and handles both periodic and aperiodic boundary conditions in one unified algorithm. The Pochoir system provides a C++ template library that allows the user's stencil specification to be executed directly in C++ without the Pochoir compiler (albeit more slowly), which simplifies user debugging and greatly simplified the implementation of the Pochoir compiler itself. A host of stencil benchmarks run on a modem multicore machine demonstrates that Pochoir outperforms standard parallel-loop implementations, typically running 2-10 times faster. The algorithm behind Pochoir improves on prior cache-efficient algorithms on multidimensional grids by making "hyperspace" cuts, which yield asymptotically more parallelism for the same cache efficiency.
机译:模具计算会根据其自身及其附近的邻居重复更新d维网格的每个点。基于“梯形分解”的并行高效缓存模板算法是已知的,但是大多数程序员发现它们很难编写。 Pochoir模具编译器允许程序员使用嵌入在C ++中的特定于领域的模具语言编写一个简单的模具规范,然后Pochoir编译器将其转换为高性能的Cilk代码,该代码采用高效的并行高速缓存无关算法。 Pochoir支持通用的d维模具,并在一种统一的算法中处理周期性和非周期性边界条件。 Pochoir系统提供了一个C ++模板库,该模板库允许在不使用Pochoir编译器的情况下直接在C ++中执行用户的模具规范(尽管速度较慢),从而简化了用户调试并大大简化了Pochoir编译器本身的实现。在现代多核计算机上运行的大量模具基准测试表明,Pochoir的性能优于标准的并行循环实现,运行速度通常快2到10倍。 Pochoir背后的算法通过进行“超空间”切割,改进了多维网格上现有的缓存有效算法,对于相同的缓存效率,渐近地产生了更多的并行度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号