首页> 外文期刊>IEEE transactions on very large scale integration (VLSI) systems >Self-Optimizing and Self-Programming Computing Systems: A Combined Compiler, Complex Networks, and Machine Learning Approach
【24h】

Self-Optimizing and Self-Programming Computing Systems: A Combined Compiler, Complex Networks, and Machine Learning Approach

机译:自优化和自编程计算系统:组合的编译器,复杂的网络和机器学习方法

获取原文
获取原文并翻译 | 示例

摘要

There exists an urgent need for determining the right amount and type of specialization while making a heterogeneous system as programmable and flexible as possible. Therefore, in this paper, we pioneer a self-optimizing and selfprogramming computing system (SOSPCS) design framework that achieves both programmability and flexibility and exploits computing heterogeneity [e.g., CPUs, GPUs, and hardware accelerators (HWAs)]. First, at compile time, we form a task pool consisting of hybrid tasks with different processing element (PE) affinities according to target applications. Tasks preferred to be executed on GPUs or accelerators are detected from target applications by neural networks. Tasks suitable to run on CPUs are formed by community detection to minimize data movement overhead. Next, a distributed reinforcement learning-based approach is used at runtime to allow agents to map the tasks onto the network-on-chip-based heterogeneous PEs by learning an optimal policy based on Q values in the environment. We have conducted experiments on a heterogeneous platform consisting of CPUs, GPUs, and HWAs with deep learning algorithms such as matrix multiplication, ReLU, and sigmoid functions. We concluded that SOSPCS provides performance improvement up to 4.12x and energy reduction up to 3.24x compared to the state-of-the-art approaches.
机译:迫切需要确定合适的专业化数量和类型,同时使异构系统尽可能地可编程和灵活。因此,在本文中,我们率先提出了一种自优化和自编程计算系统(SOSPCS)设计框架,该框架可实现可编程性和灵活性,并利用计算的异构性(例如CPU,GPU和硬件加速器(HWA))。首先,在编译时,我们根据目标应用程序形成了一个由具有不同处理元素(PE)亲和力的混合任务组成的任务池。通过神经网络从目标应用程序中检测优先在GPU或加速器上执行的任务。通过社区检测可以形成适合在CPU上运行的任务,以最大程度地减少数据移动开销。接下来,在运行时使用基于分布式强化学习的方法,以允许代理通过学习基于环境中Q值的最佳策略,将任务映射到基于芯片网络的异构PE。我们已经在由CPU,GPU和HWA组成的异构平台上进行了实验,这些平台具有深度学习算法,例如矩阵乘法,ReLU和Sigmoid函数。我们得出的结论是,与最先进的方法相比,SOSPCS的性能提高了4.12倍,能耗降低了3.24倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号