首页> 外文期刊>Concurrency and computation: practice and experience >Faithful performance prediction of a dynamic task-based runtime system for heterogeneous multi-core architectures
【24h】

Faithful performance prediction of a dynamic task-based runtime system for heterogeneous multi-core architectures

机译:异构多核体系结构基于动态任务的运行时系统的忠实性能预测

获取原文
获取原文并翻译 | 示例

摘要

Multi-core architectures comprising several graphics processing units (GPUs) have become mainstream inrnthe field of high-performance computing. However, obtaining the maximum performance of such heterogeneousrnmachines is challenging as it requires to carefully off-load computations and manage data movementsrnbetween the different processing units. The most promising and successful approaches so far build onrntask-based runtimes that abstract the machine and rely on opportunistic scheduling algorithms. As a consequence,rnthe problem gets shifted to choosing the task granularity, task graph structure, and optimizing thernscheduling strategies. Trying different combinations of these different alternatives is also itself a challenge.rnIndeed, obtaining accurate measurements requires reserving the target system for the whole duration ofrnexperiments. Furthermore, observations are limited to the few available systems at hand and may be difficultrnto generalize. In this article, we show how we crafted a coarse-grain hybrid simulation/emulation of StarPU,rna dynamic runtime for hybrid architectures, over SimGrid, a versatile simulator of distributed systems.rnThis approach allows to obtain performance predictions of classical dense linear algebra kernels accuraternwithin a few percents and in a matter of seconds, which allows both runtime and application designers tornquickly decide which optimization to enable or whether it is worth investing in higher-end graphics processingrnunits or not. Additionally, it allows to conduct robust and extensive scheduling studies in a controlledrnenvironment whose characteristics are very close to real platforms while having reproducible behavior.
机译:包含多个图形处理单元(GPU)的多核体系结构已成为高性能计算领域的主流。然而,获得这种异构机器的最大性能是有挑战性的,因为它需要仔细卸载计算并管理不同处理单元之间的数据移动。迄今为止,最有前途和最成功的方法是建立基于任务的运行时,该运行时将机器抽象并依赖机会调度算法。结果,问题转向了选择任务粒度,任务图结构和优化调度策略。尝试将这些不同的替代方案进行不同的组合本身也是一个挑战。实际上,获得准确的测量值需要在整个实验过程中保留目标系统。此外,观察仅限于手头的几个可用系统,可能难以概括。在本文中,我们展示了如何在分布式系统的通用模拟器SimGrid上精心设计StarPU的粗粒度混合仿真/仿真,用于混合架构的rna动态运行时。这种方法可以获取经典密集线性代数内核的性能预测精确度在几分之一秒之内,这使运行时和应用程序设计人员都可以迅速决定启用哪种优化,或者是否值得在高端图形处理单元上进行投资。此外,它允许在受控环境中进行功能强大且广泛的调度研究,该环境的特征与真实平台非常接近,并且具有可复制的行为。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号