首页> 外文学位 >Application and system support for reconfigurable coprocessors in multicore devices.
【24h】

Application and system support for reconfigurable coprocessors in multicore devices.

机译:对多核设备中可重新配置的协处理器的应用程序和系统支持。

获取原文
获取原文并翻译 | 示例

摘要

Embedded multicore devices often require high performance with minimal power consumption; many systems use dedicated hardware units to meet these constraints. However, embedded systems have also become increasingly multi-purpose and must be able to execute a wide range of applications---some of which might not yet be known at design time. It is therefore difficult to choose an appropriate mix of dedicated hardware that meets a device's size, cost, and capability constraints. A reconfigurable hardware (RH) coprocessor is a potential solution, as it is highly effective at accelerating a variety of different tasks (which need not necessarily be known in advance), and does so using less energy than general-purpose processors.;In this thesis, I propose a reconfigurable computing system-on-chip that combines general-purpose processor core(s) with a reconfigurable coprocessor. Applications executing on this system use the RH to accelerate commonly-executed functions. In this thesis, I first describe the communication model used between the processor(s) and RH coprocessor. I then describe the programming interface applications use to access the RH, and show that my model allows applications to securely access the RH coprocessor without requiring operating system intervention---greatly reducing the overhead of using the coprocessor. Because of this, my RH coprocessor can even accelerate tasks (or kernels of an application) whose execution time (when running in software) is measured in hundreds of cycles.;After establishing the platform, I examine how my proposed system performs, and propose extensions to the system to further improve system performance. In this thesis, I will demonstrate that, when using my coprocessor memory interface, workloads executing across eight processor cores and the shared RH fabric perform ∼95% as well as they would on an idealized system where the coprocessor has zero-cycle access to shared memory. Additionally, I examine the impact hybrid RH/software applications have on software-only applications, and propose a mechanism that prevents streaming RH applications from polluting shared levels of the system's cache; this simple modification improved the performance of software-only applications by up to 32%. I also examine the behavior of software-only applications coscheduled alongside hybrid RH/software applications on simultaneous multithreaded processors, showing that they perform up to ∼95% as fast as they do when a multicore system executes the two applications. This is much faster than two software applications can run when coscheduled together, but not as fast as a multicore machine because the hybrid application still requires CPU resources to execute, slowing down the coscheduled software-only application;Finally, I examine methods that allow multicore RH systems to better utilize RH resources, allowing systems with limited RH resources to perform nearly as well as systems containing more RH resources. I first show that hybrid applications that call the same RH kernel can better utilize the RH by sharing the configured resources. On eight-processor systems executing eight copies of the same applications, workloads that shared configured RH kernels performed 97.4% as well as systems that did not, despite the fact that shared systems required ∼⅛th of the RH resources. I also examined a modified RH kernel scheduling algorithm that periodically determines which RH kernels should be loaded on the RH at any given time. This new scheduling algorithm could better select which RH kernels should be configured on multicore systems. I show that this new scheduler always performs as good, or better than the previous scheduler, and in extreme cases can result in RH allocations that improve system performance by over 2x.;In this thesis, I examine many of the design choices involved in creating a multicore RH computing system, and examine how a modern operating system should present the RH resources to user applications. I then demonstrate that such a system provides the performance required in next-generation computing application, while providing the programmability and flexibility to accelerate many different application domains, and even offer performance improvements to applications not considered when the chip was first fabricated. By doing this, embedded systems manufacturers can make faster, more capable products that consume less energy. Additionally, the hardware in these new devices will be able to adapt to new applications that are created after the device has shipped, allowing all applications to be accelerated by the processor, and not just the applications that the processor was optimized for.
机译:嵌入式多核设备通常要求高性能且功耗最小。许多系统使用专用硬件单元来满足这些约束。但是,嵌入式系统也变得越来越用途广泛,必须能够执行各种应用程序-其中一些在设计时可能还不为人所知。因此,很难选择合适的专用硬件组合来满足设备的尺寸,成本和功能限制。可重配置硬件(RH)协处理器是一种潜在的解决方案,因为它在加速各种不同任务(不必事先知道)方面非常有效,并且比通用处理器使用的能源更少。论文中,我提出了一种可重构的片上计算系统,该系统将通用处理器内核与可重构的协处理器结合在一起。在该系统上执行的应用程序使用RH来加速常用的功能。在本文中,我首先描述了处理器与RH协处理器之间的通信模型。然后,我描述了应用程序用来访问RH的编程接口,并展示了我的模型允许应用程序安全地访问RH协处理器而无需操作系统的干预-极大地减少了使用协处理器的开销。因此,我的RH协处理器甚至可以加速以数百个周期来衡量其执行时间(在软件中运行时)的任务(或应用程序的内核)。建立平台之后,我将检查拟议系统的性能,并提出建议。系统扩展,以进一步提高系统性能。在本文中,我将演示使用协处理器内存接口时,在八个处理器内核和共享RH结构上执行的工作负载执行〜95%的工作,与在协处理器可以零周期访问共享的理想系统上执行的工作一样记忆。另外,我研究了混合RH /软件应用程序对纯软件应用程序的影响,并提出了一种机制,可防止流式RH应用程序污染系统缓存的共享级别。这种简单的修改将纯软件应用程序的性能提高了32%。我还检查了同时运行的多线程处理器上与混合RH /软件应用程序一起安排的纯软件应用程序的行为,表明它们的执行速度比多核系统执行这两个应用程序时的执行速度快约95%。这比两个软件应用程序一起进行调度时运行的速度要快得多,但不及多核计算机那么快,因为混合应用程序仍然需要CPU资源来执行,这减慢了该调度的纯软件应用程序的运行速度;最后,我研究了允许多核的方法RH系统可以更好地利用RH资源,从而使RH资源有限的系统几乎可以像包含更多RH资源的系统一样执行。我首先表明,调用相同RH内核的混合应用程序可以通过共享配置的资源更好地利用RH。在执行八个相同应用程序副本的八处理器系统上,共享共享的RH内核的工作负载执行了97.4%的工作负载,而没有共享RH资源的系统却没有执行该操作。我还检查了修改后的RH内核调度算法,该算法定期确定在任何给定时间应在RH上加载哪些RH内核。这种新的调度算法可以更好地选择应在多核系统上配置哪些RH内核。我展示了这种新的调度程序的性能始终比以前的调度程序好,甚至更好,并且在极端情况下可能会导致RH分配,从而使系统性能提高2倍以上;在本文中,我研究了创建过程中涉及的许多设计选择多核RH计算系统,并研究现代操作系统应如何向用户应用程序提供RH资源。然后,我证明了这样的系统可以提供下一代计算应用程序所需的性能,同时提供可编程性和灵活性以加速许多不同的应用程序域,甚至可以为首次制造芯片时未考虑的应用程序提供性能改进。通过这样做,嵌入式系统制造商可以生产速度更快,功能更强大的产品,并且消耗更少的能源。此外,这些新设备中的硬件将能够适应设备出厂后创建的新应用程序,从而使处理器可以加速所有应用程序,而不仅是针对处理器进行了优化的应用程序。

著录项

  • 作者

    Garcia, Philip C.;

  • 作者单位

    The University of Wisconsin - Madison.;

  • 授予单位 The University of Wisconsin - Madison.;
  • 学科 Engineering Computer.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 153 p.
  • 总页数 153
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号