首页> 外文会议>International Symposium on Microarchitecture >SCRATCH: An End-to-End Application-Aware So-GPGPU Architecture and Trimming Tool
【24h】

SCRATCH: An End-to-End Application-Aware So-GPGPU Architecture and Trimming Tool

机译:Scratch:端到端的应用程序感知SO-GPGPU体系结构和修剪工具

获取原文

摘要

Applying advanced signal processing and artificial intelligence algorithms is often constrained by power and energy consumption limitations, in high performance and embedded, cyber-physical and super-computing devices and systems. Although Graphics Processing Units (GPUs) helped to mitigate the throughput-per-Watt performance problem in many compute-intensive applications, dealing more efficiently with the autonomy requirements of intelligent systems demands power-oriented customized architectures that are specially tuned for each application, preferably without manual redesign of the entire hardware and capable of supporting legacy code. Hence, this work proposes a new SCRATCH framework that aims at automatically identifying the specific requirements of each application kernel, regarding instruction set and computing unit demands, allowing for the generation of application-specific and FPGA-implementable trimmed-down GPU-inspired architectures. The work is based on an improved version of the original MIAOW system (here named MIAOW2.0), which is herein extended to support a set of 156 instructions and enhanced to provide a fast prefetch memory system and a dual-clock domain. Experimental results with 17 highly relevant benchmarks, using integer and floating-point arithmetic, demonstrate that we have been able to achieve an average of 140× speedup and 115× higher energy-efficiency levels (instructions-per-Joule) when compared to the original MIAOW system, and a 2.4× speedup and 2.1× energy-efficiency gains compared against our optimized version without pruning.
机译:应用高级信号处理和人工智能算法通常受到功率和能耗限制的限制,高性能和嵌入式,网络物理和超计算设备和系统。尽管图形处理单元(GPU)有助于减轻多瓦的威特性能问题,但在许多计算密集型应用中,更有效地处理智能系统的自主需求需求为每个应用程序专门调整的电源导向的定制架构,优选地如果没有手动重新设计整个硬件并能够支持遗留代码。因此,这项工作提出了一种新的划痕框架,其目的是自动识别每个应用程序内核的特定要求,关于指令集和计算单元所需,允许生成应用特定于应用程序的和FPGA可实现的修整GPU启发的架构。该工作基于原始MiaOW系统的改进版本(这里命名为MiaOW2.0),这在此扩展到支持一组156条指令,并增强以提供快速预取存储器系统和双时钟域。使用整数和浮点算术的实验结果具有17个高度相关的基准,证明我们能够平均达到140倍的加速度和115倍的能效水平(每焦耳),与原版相比MiaOW系统,2.4×加速和2.1×节能增益与我们的优化版本无限制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号