首页> 外文期刊>IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems >Efficient Job Offloading in Heterogeneous Systems Through Hardware-Assisted Packet-Based Dispatching and User-Level Runtime Infrastructure
【24h】

Efficient Job Offloading in Heterogeneous Systems Through Hardware-Assisted Packet-Based Dispatching and User-Level Runtime Infrastructure

机译:通过基于硬件辅助数据包的调度和用户级运行时基础架构在异构系统中卸载异构系统的高效工作

获取原文
获取原文并翻译 | 示例

摘要

Emerging heterogeneous systems architectures increasingly integrate general-purpose processors, GPUs, and other specialized computational units to provide both power and performance benefits. While the motivations for developing systems with accelerators are clear, it is important to design efficient dispatching mechanisms in terms of performance and energy while leveraging programmability and orchestration of the diverse computational components. In this paper, we present an infrastructure composed of a hardware, general, packet-based processing-dispatching unit, named generic packet processing unit (GPPU), and of an associated runtime that facilitates user-level access to GPPU objects, such as packets, queues, and contexts. Hence, we remove drawbacks of traditional costly user-to-kernel-level operations, low-level accelerator subtleties that hinder programming productivity, along with architectural obstacles such as handling accelerators' unified virtual address space. We present the design and evaluation of our framework by integrating the GPPU infrastructure with data streaming type accelerators, image filtering, and matrix multiplication, tightly coupled to ARMv8 architecture via unified virtual memory. Under scaling workload our proposed dispatching methods can deliver $3.7{imes }$ performance improvement over baseline offloading, and up to $4.7{imes }$ better energy efficiency.
机译:新兴异构系统架构越来越集成了通用处理器,GPU和其他专用计算单元,以提供功率和性能效益。虽然具有加速器的开发系统的动机很清楚,但对于在性能和能量方面设计有效的调度机制非常重要,同时利用各种计算组件的可编程性和编排。在本文中,我们介绍了一个由硬件,一般数据包的处理调度单元组成的基础架构,名为通用分组处理单元(GPPU),以及促进对GPPU对象(例如数据包)的用户级访问的关联运行时,队列和上下文。因此,我们删除了传统的昂贵的用户到内核级操作,妨碍编程生产力的低级加速器微妙之处,以及诸如处理加速器统一虚拟地址空间的架构障碍。我们通过将GPPU基础架构与数据流型加速器,图像过滤和矩阵乘法集成,通过统一虚拟内存紧密地耦合到ARMv8架构来介绍我们的框架的设计和评估。在缩放工作负载下,我们建议的调度方法可以通过基线卸载提供3.7 { times} $性能改进,最高可达4.7美元{ times}。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号