首页> 外文会议>IEEE 17th International Symposium on High Performance Computer Architecture >HAQu: Hardware-accelerated queueing for fine-grained threading on a chip multiprocessor
【24h】

HAQu: Hardware-accelerated queueing for fine-grained threading on a chip multiprocessor

机译:HAQu:用于芯片多处理器上的细粒度线程的硬件加速排队

获取原文

摘要

Queues are commonly used in multithreaded programs for synchronization and communication. However, because software queues tend to be too expensive to support finegrained parallelism, hardware queues have been proposed to reduce overhead of communication between cores. Hardware queues require modifications to the processor core and need a custom interconnect. They also pose difficulties for the operating system because their state must be preserved across context switches. To solve these problems, we propose a hardware-accelerated queue, or HAQu. HAQu adds hardware to a CMP that accelerates operations on software queues. Our design implements fast queueing through an application's address space with operations that are compatible with a fully software queue. Our design provides accelerated and OS-transparent performance in three general ways: (1) it provides a single instruction for enqueueing and dequeueing which significantly reduces the overhead when used in fine-grained threading; (2) operations on the queue are designed to leverage low-level details of the coherence protocol; and (3) hardware ensures that the full state of the queue is stored in the application's address space, thereby ensuring virtualization. We have evaluated our design in the context of application domains: offloading fine-grained checks for improved software reliability, and automatic, fine-grained parallelization using decoupled software pipelining.
机译:队列通常在多线程程序中用于同步和通信。但是,由于软件队列往往太昂贵而无法支持细粒度的并行性,因此提出了硬件队列来减少内核之间通信的开销。硬件队列需要修改处理器内核,并且需要自定义互连。它们也给操作系统带来了困难,因为必须在上下文切换之间保留它们的状态。为了解决这些问题,我们提出了一个硬件加速队列或HAQu。 HAQu将硬件添加到CMP中,以加速对软件队列的操作。我们的设计使用与完全软件队列兼容的操作在应用程序的地址空间中实现快速排队。我们的设计通过三种通用方式提供了加速和透明的操作系统性能:(1)它提供了用于入队和出队的单一指令,这大大减少了用于细粒度线程处理的开销; (2)队列上的操作旨在利用一致性协议的低级详细信息; (3)硬件确保队列的完整状态存储在应用程序的地址空间中,从而确保虚拟化。我们已经在应用程序域的上下文中评估了我们的设计:卸载细粒度的检查以提高软件的可靠性,以及使用解耦的软件流水线自动进行细粒度的并行化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号