首页> 外文OA文献 >Source-to-source compilation of loop programs for manycore processors
【2h】

Source-to-source compilation of loop programs for manycore processors

机译:用于多核处理器的循环程序的源到源编译

摘要

It is widely accepted today that the end of microprocessor performance growthudbased on increasing clock speeds and instruction-level parallelism (ILP)uddemands new ways of exploiting transistor densities.udManycore processors (most commonly known as udGPGPUs or simply GPUs) provide a viable solution to this performance udscaling bottleneck through large numbers of lightweight compute cores udand memory hierarchies that rely primarily on software for their udefficient utilization. The widespread proliferation of this class of udarchitectures today is a clear indication that exposing and managing udparallelism on a large scale as well as efficiently orchestrating udon-chip data movement is becoming an increasingly critical concern for udhigh-performance software development. In such a computing landscape udperformance portability -- the ability to exploit the power of a variety udof manycore chips while minimizing the impact on software development udand productivity -- is perhaps one of the most important and challenging udobjectives for our research community. ududThis thesis is about udperformance portability for manycore processors and how source-to-source udcompilation can help us achieve it. In particular, we show that for anudimportant set of loop-programs, performance portability is udattainable at low cost through compile-time polyhedral analysis and optimizationudand parametric tiling for run-time performance udtuning. In other words, we propose and evaluate a source-to-source udcompilation path that takes affine loop-programs as input and udproduces parametrically tiled parallel code amenable to run-time tuning udacross different manycore platforms and devices -- a very useful udand powerful property if we seek performance portability because it uddecouples the compiler from the performance tuning process. The produced udcode relies on a platform-independent run-time environment, called Avelas,udthat allows us to formulate a robust and portable code generation algorithm.udOur experimental evaluation shows that Avelas induces low run-time overheadudand even substantial speed-ups for wavefront-parallel programs compared to a state-of-the-artudcompile-time scheme with no run-time support. We also claim that the low overhead of Avelas is a strongudindication that it can also be effective as a general-purpose programming modeludfor manycore processors as we demonstrate for a set of ParBoil benchmarks.
机译:如今,基于时钟速度和指令级并行性(ILP)的提高,微处理器性能增长的结束已趋于普遍,ud要求采用晶体管密度的新方法。udManycore处理器(最通常称为 udGPGPU或简称GPU)提供通过大量轻量级计算内核缩小扩容瓶颈的解决方案,扩容主要依靠软件来提高效率的内存层次结构。当今这类 udarchitecture的广泛扩散,清楚地表明,大规模公开和管理 udparallelism以及有效地编排 udon芯片数据移动正成为 ud高性能软件开发中越来越重要的关注点。在这样的计算环境中 udperformance可移植性-能够利用多种​​ udof多核芯片的功能同时将对软件开发 udand生产率的影响最小化的能力-可能是我们研究界最重要和最具挑战性的 udobjectives之一。 ud ud本文讨论的是 ud性能对于许多核心处理器的可移植性,以及源到源 udcompilation如何帮助我们实现这一目标。特别地,我们表明,对于一组重要的循环程序,通过编译时多面体分析和优化 udand参数化平铺以实现运行时性能 udtuning,可以以低成本实现性能可移植性。换句话说,我们提出并评估一个源到源 udcompilation路径,该路径以仿射循环程序作为输入,并 ud生成可在运行时进行调整的参数化平铺并行代码跨越不同的许多核心平台和设备-这非常有用 udand强大的属性,因为我们寻求性能可移植性,因为它使编译器与性能调整过程脱钩。产生的 udcode依赖于一个独立于平台的运行时环境,称为Avelas, ud使我们能够制定一个健壮且可移植的代码生成算法。 ud我们的实验评估表明,Avelas会产生较低的运行时开销 ud,甚至可观的速度与没有运行时支持的最新 udcompile-time方案相比,波前并行程序的放大率更高。我们还声称,Avelas的低开销是一个强有力的指示,表明它对于许多核心处理器也可以作为通用编程模型 ud有效,正如我们为一组ParBoil基准测试所证明的那样。

著录项

  • 作者

    Konstantinidis Athanasios;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号