首页> 外文期刊>Concurrency and Computation >StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
【24h】

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

机译:StarPU:用于异构多​​核体系结构的任务调度的统一平台

获取原文
获取原文并翻译 | 示例
       

摘要

In the field of HPC, the current hardware trend is to design multiprocessor architectures featuring heterogeneous technologies such as specialized coprocessors (e.g. Cell/BE) or data-parallel accelerators (e.g. GPUs). Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been devoted to efficiently offload parts of the computations. However, designing an execution model that unifies all computing units and associated embedded memory remains a main challenge. We therefore designed StarPU, an original runtime system providing a high-level, unified execution model tightly coupled with an expressive data management library. The main goal of StarPU is to provide numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware on the one hand, and easily develop and tune powerful scheduling algorithms on the other hand. We have developed several strategies that can be selected seamlessly at run-time, and we have analyzed their efficiency on several algorithms running simultaneously over multiple cores and a GPU. In addition to substantial improvements regarding execution times, we have obtained consistent superlinear parallelism by actually exploiting the heterogeneous nature of the machine. We eventually show that our dynamic approach competes with the highly optimized MAGMA library and overcomes the limitations of the corresponding static scheduling in a portable way.
机译:在HPC领域中,当前的硬件趋势是设计具有异构技术的多处理器体系结构,例如专用协处理器(例如Cell / BE)或数据并行加速器(例如GPU)。接近这些架构的理论性能是一个复杂的问题。实际上,已经进行了大量的努力来有效地卸载部分计算。但是,设计一个统一所有计算单元和相关嵌入式内存的执行模型仍然是一个主要挑战。因此,我们设计了StarPU,这是一个原始的运行时系统,提供了高级统一执行模型,并与表达性数据管理库紧密结合。 StarPU的主要目标是为数字内核设计人员提供一种便捷的方法,一方面可以通过异构硬件生成并行任务,另一方面可以轻松地开发和调整强大的调度算法。我们已经开发了几种可以在运行时无缝选择的策略,并且已经分析了在多个内核和GPU上同时运行的几种算法的效率。除了在执行时间方面的实质性改进外,我们还通过实际利用机器的异构特性获得了一致的超线性并行性。我们最终证明,我们的动态方法可与高度优化的MAGMA库竞争,并以可移植的方式克服了相应静态调度的局限性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号