首页> 外文会议>ACM/IEEE Annual International Symposium on Computer Architecture >Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems
【24h】

Efficiently Supporting Dynamic Task Parallelism on Heterogeneous Cache-Coherent Systems

机译:在异构缓存一致性系统上有效支持动态任务并行

获取原文

摘要

Manycore processors, with tens to hundreds of tiny cores but no hardware-based cache coherence, can offer tremendous peak throughput on highly parallel programs while being complexity and energy efficient. Manycore processors can be combined with a few high-performance big cores for executing operating systems, legacy code, and serial regions. These systems use heterogeneous cache coherence (HCC) with hardware-based cache coherence between big cores and software-centric cache coherence between tiny cores. Unfortunately, programming these heterogeneous cache-coherent systems to enable collaborative execution is challenging, especially when considering dynamic task parallelism. This paper seeks to address this challenge using a combination of light-weight software and hardware techniques. We provide a detailed description of how to implement a work-stealing runtime to enable dynamic task parallelism on heterogeneous cache-coherent systems. We also propose direct task stealing (DTS), a new technique based on user-level interrupts to bypass the memory system and thus improve the performance and energy efficiency of work stealing. Our results demonstrate that executing dynamic task-parallel applications on a 64-core system (4 big, 60 tiny) with complexity-effective HCC and DTS can achieve: $7 imes$ speedup over a single big core; $1.4 imes$ speedup over an area-equivalent eight bigcore system with hardware-based cache coherence; and 21% better performance and similar energy efficiency compared to a 64-core system (4 big, 60 tiny) with full-system hardware-based cache coherence.
机译:拥有数十到数百个微内核,但没有基于硬件的缓存一致性的Manycore处理器可以在高度并行的程序上提供巨大的峰值吞吐量,同时又具有复杂性和能源效率。 Manycore处理器可以与几个高性能大内核结合使用,以执行操作系统,旧版代码和串行区域。这些系统使用异构缓存一致性(HCC),以及大内核之间基于硬件的缓存一致性以及小内核之间基于软件的缓存一致性。不幸的是,对这些异构的缓存一致性系统进行编程以实现协作执行具有挑战性,特别是在考虑动态任务并行性时。本文力求通过结合轻量级软件和硬件技术来应对这一挑战。我们提供了有关如何实现工作窃取运行时以在异构高速缓存一致性系统上实现动态任务并行化的详细说明。我们还提出了直接任务窃取(DTS),这是一种基于用户级中断绕过内存系统的新技术,从而提高了工作窃取的性能和能效。我们的结果表明,在具有复杂性有效的HCC和DTS的64核系统(4个大,60个小)上执行动态任务并行应用程序可以实现:在单个大核上的速度提高了7倍。在具有基于硬件的缓存一致性的,相当于一个区域的八个bigcore系统上,以$ 1.4倍的速度进行加速;与具有基于全系统硬件的高速缓存一致性的64核系统(4个大,60个小)相比,性能和类似的能源效率提高了21%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号