首页> 外文会议>IEEE Conference on High Performance Extreme Computing >Heterogeneous work-stealing across CPU and DSP cores
【24h】

Heterogeneous work-stealing across CPU and DSP cores

机译:跨CPU和DSP内核的异构工作窃取

获取原文

摘要

Due to the increasing power constraints and higher and higher performance demands, many vendors have shifted their focus from designing high-performance computer nodes using powerful multicore general-purpose CPUs, to nodes containing a smaller number of general-purpose CPUs aided by a larger number of more power-efficient special purpose processing units, such as GPUs, FPGAs or DSPs. While offering a lower power-to-performance ratio, unfortunately, such heterogeneous systems are notoriously hard to program, forcing the users to resort to lower-level direct programming of the special purpose processors and manually managing data transfer and synchronization between the parts of the program running on general-purpose CPUs and on special-purpose processors. In this paper, we present HC-K2H, a programming model and runtime system for the Texas Instruments Keystone II Hawking platform, consisting of 4 ARM CPUs and 8 TI DSP processors. This System-on-a-Chip (SoC) offers high floating-point performance with lower power requirements than other processors with comparable performance. We present the design and implementation of a hybrid programming model and work-stealing runtime that allows tasks to be created and executed on both the ARM and DSP, and enables the seamless execution and synchronization of tasks regardless of whether they are running on the ARM or DSP. The design of our programming model and runtime is based on an extension of the Habanero-C programming system. We evaluate our implementation using task-parallel benchmarks on a Hawking board, and demonstrate excellent scaling compared to sequential implementations on a single ARM processor.
机译:由于不断增加的功率限制以及对性能的越来越高的要求,许多供应商已将其重点从使用功能强大的多核通用CPU设计高性能计算机节点转移到包含较少数量的通用CPU和大量辅助CPU的节点上。更节能的专用处理单元,例如GPU,FPGA或DSP。不幸的是,尽管这样的异构系统虽然提供了较低的功率/性能比,但是却难以编程,从而迫使用户诉诸于专用处理器的低级直接编程,并手动管理数据传输和同步。在通用CPU和专用处理器上运行的程序。在本文中,我们介绍了HC-K2H,这是用于Texas Instruments Keystone II Hawking平台的编程模型和运行时系统,由4个ARM CPU和8个TI DSP处理器组成。与其他具有可比性能的处理器相比,该片上系统(SoC)具有高浮点性能和更低的功耗要求。我们介绍了一种混合编程模型和工作窃取运行时的设计和实现,该模型允许在ARM和DSP上创建和执行任务,并且无论任务是在ARM上还是在ARM上运行,都可以无缝执行和同步任务。 DSP。我们的编程模型和运行时的设计基于Habanero-C编程系统的扩展。我们使用Hawking板上的任务并行基准评估我们的实现,并证明与单个ARM处理器上的顺序实现相比具有出色的可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号