首页> 外文会议>International Workshop on Languages and Compilers for Parallel Computing >Mozart: Efficient Composition of Library Functions for Heterogeneous Execution
【24h】

Mozart: Efficient Composition of Library Functions for Heterogeneous Execution

机译:莫扎特:用于异构执行的图书馆函数的高效组成

获取原文

摘要

Current processor trend is to couple a commodity processor with a GPU, a co-processor, or an accelerator. To unleash the full computational power of such heterogeneous systems is a daunting task: programmers often resort to heterogeneous scheduling runtime frameworks that use device specific library routines. However, highly-tuned libraries do not compose very well across heterogeneous architectures. That is, important performance-oriented optimizations such as data locality and reuse "across" library calls is not fully exploited. In this paper, we present a framework, called Mozart, to extend existing library frameworks to efficiently compose a sequence of library calls for heterogeneous execution. Mozart consists of two components: library description (LD) and library composition runtime. We advocate library writers to wrap existing libraries using LD in order to provide their performance parameters on heterogeneous cores, no programmer intervention is necessary. Our runtime performs composition of libraries via task-fission, load balances among heterogeneous cores using information from LD, and automatically adapts to runtime behavior of an application. We evaluate Mozart on a Xeon + 2 Xeon Phi system using the High Performance Linpack benchmark which is the most popular benchmark to rank supercomputers in TOP500 and show GFLOPS improvement of 31.7% over MKL with Automatic Offload and 6.7%) over hand-optimized ninja code.
机译:当前的处理器趋势是将商品处理器与GPU,协处理器或加速器耦合。为了释放这种异构系统的完整计算能力是一个令人生畏的任务:程序员经常诉诸使用设备特定库例程的异构调度运行时框架。但是,高调的库在异构架构上没有很好地撰写。也就是说,未充分利用“跨”库调用等重要的表演型优化和“跨越”库调用。在本文中,我们介绍了一个框架,称为Mozart,以扩展现有的库框架,以有效地撰写一系列库调用异构执行。莫扎特由两个组件组成:库描述(LD)和库组成运行时。我们提倡图书馆作家使用LD包装现有库,以便在异构核心上提供它们的性能参数,没有必要的程序员干预。我们的运行时通过任务裂变执行图书馆的组成,使用来自LD的信息的异构核心中的负载余额,并自动适应应用程序的运行时行为。我们使用高性能LINPACK基准在Xeon + 2 Xeon Phi系统上评估Mozart,该基准是最受欢迎的基准测试,可以在Top500中排名超级计算机,并显示GFLOPS改善31.7%,通过自动卸载和6.7%)通过手工优化的忍者代码。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号