首页> 外文学位 >Automatic design of efficient application-centric architectures.
【24h】

Automatic design of efficient application-centric architectures.

机译:自动设计高效的以应用程序为中心的体系结构。

获取原文
获取原文并翻译 | 示例

摘要

As the market for embedded devices continues to grow, the demand for high performance, low cost, and low power computation grows as well. Many embedded applications perform computationally intensive tasks such as processing streaming video or audio, wireless communication, or speech recognition. Often, performance requirements are on the order of 10-100 billion operations per second and must be implemented within tight power budgets on the order of 100 mW. Typically, general purpose processors are not able to meet these performance and power requirements. Custom hardware in the form of loop accelerators are often used to execute the compute-intensive portions of these applications because they can achieve significantly higher levels of performance and power efficiency.;Automated hardware synthesis from high level specifications is a key technology used in designing these accelerators, because the resulting hardware is correct by construction, easing verification and greatly decreasing time-to-market in the quickly evolving embedded domain. In this dissertation, a compiler-directed approach is used to design a loop accelerator from a C specification and a throughput requirement. The compiler analyzes the loop and generates a virtual architecture containing sufficient resources to sustain the required throughput. Next, a software pipelining scheduler maps the operations in the loop to the virtual architecture. Finally, the accelerator datapath is derived from the resulting schedule.;In this dissertation, synthesis of different types of loop accelerators is investigated. First, the system for synthesizing single loop accelerators is detailed. In particular, a scheduler is presented that is aware of the effects of its decisions on the resulting hardware, and attempts to minimize hardware cost. Second, synthesis of multifunction loop accelerators, or accelerators capable of executing multiple loops, is presented. Such accelerators exploit coarse-grained hardware sharing across loops in order to reduce overall cost. Finally, synthesis of post-programmable accelerators is presented, allowing changes to be made to the software after an accelerator has been created.;The tradeoffs between the flexibility, cost, and energy efficiency of these different types of accelerators are investigated. Automatically synthesized loop accelerators are capable of achieving order-of-magnitude gains in performance, area efficiency, and power efficiency over processors, and programmable accelerators allow software changes while maintaining highly efficient levels of computation.
机译:随着嵌入式设备市场的持续增长,对高性能,低成本和低功耗计算的需求也随之增长。许多嵌入式应用程序执行计算密集型任务,例如处理流视频或音频,无线通信或语音识别。通常,性能要求大约为每秒10至1000亿次操作,并且必须在100 mW左右的严格功率预算内实现。通常,通用处理器无法满足这些性能和功耗要求。循环加速器形式的自定义硬件通常用于执行这些应用程序的计算密集型部分,因为它们可以实现更高水平的性能和电源效率。高水平规格的自动硬件综合是设计这些应用程序的关键技术加速器,因为最终的硬件在构造上是正确的,简化了验证,并大大缩短了快速发展的嵌入式领域的上市时间。本文采用编译器指导的方法,根据C规范和吞吐量要求设计了循环加速器。编译器分析循环并生成一个虚拟架构,其中包含足够的资源来维持所需的吞吐量。接下来,软件管道调度程序将循环中的操作映射到虚拟体系结构。最后,从生成的调度表中导出加速器数据路径。本文对不同类型的循环加速器进行了研究。首先,详细说明用于合成单回路加速器的系统。特别地,提出了一种调度器,该调度器知道其决策对所得硬件的影响,并试图使硬件成本最小化。其次,介绍了多功能循环加速器或能够执行多个循环的加速器的综合。此类加速器利用了跨循环的粗粒度硬件共享,以降低总体成本。最后,介绍了后编程加速器的综合,允许在创建加速器后对软件进行更改。;研究了这些不同类型的加速器在灵活性,成本和能效之间的权衡。自动合成的循环加速器能够在处理器上实现性能,面积效率和电源效率的数量级增益,可编程加速器允许在保持高效计算水平的同时进行软件更改。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号