首页> 外文会议>International conference on embedded computer systems: architectures, modeling and simulation >SHARQ: Software-Defined Hardware-Managed Queues for Tile-Based Manycore Architectures
【24h】

SHARQ: Software-Defined Hardware-Managed Queues for Tile-Based Manycore Architectures

机译:SHARQ:用于基于图块的Manycore体系结构的软件定义的硬件管理队列

获取原文

摘要

The recent trend towards tile-based manycore architectures has helped to tackle the memory wall by physically distributing memories and processing nodes. Distributed operating systems and applications allow to exploit the increased scalability of such architectures, but still face the data-to-task locality challenge. As inter-tile communication, thread synchronization and data transport often impose significant software overhead on such architectures, many applications would benefit from a more efficient and powerful communication primitive with minimal software involvement. We propose software-defined hardware-managed queues for distributed computing architectures that enable efficient inter-tile communication by leveraging application-specific queues with arbitrarily sized elements. To ensure (remote) processing of queued elements, SHARQ introduces the concept of an optional handler task, which is scheduled by hardware on demand. Queue and memory management, intra- and inter-tile data transfer, and handler task invocation are entirely handled by hardware. Only the dynamic queue creation at runtime is performed in software. As an example use-case, we integrated SHARQ into the MPI library. The evaluation with the MPI-based NAS benchmarks shows a reduction in execution time by up to 48% for the communication intense IS kernel in a 4 × 4 tile design on an FPGA platform with a total of 80 LEON3 cores.
机译:基于图块的多核架构的最新趋势通过物理分布内存和处理节点来帮助解决内存墙问题。分布式操作系统和应用程序允许利用此类体系结构增加的可伸缩性,但仍面临数据到任务本地性的挑战。由于小块间通信,线程同步和数据传输通常会在此类体系结构上增加大量软件开销,因此许多应用程序将受益于更高效,更强大的通信原语,而无需花费太多软件。我们为分布式计算体系结构提出了软件定义的硬件管理队列,该队列通过利用具有任意大小元素的特定于应用程序的队列来实现高效的瓦片间通信。为了确保(远程)处理排队的元素,SHARQ引入了可选处理程序任务的概念,该任务由硬件按需调度。队列和内存管理,块内和块间数据传输以及处理程序任务调用完全由硬件处理。仅在运行时在软件中创建动态队列。作为一个示例用例,我们将SHARQ集成到MPI库中。使用基于MPI的NAS基准进行的评估显示,在具有80个LEON3内核的FPGA平台上的4×4瓦片设计中,通信密集型IS内核的执行时间最多减少了48%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号