首页> 外文期刊>Parallel Computing >Toward data-driven architectural support in improving the performance of future HPC architectures
【24h】

Toward data-driven architectural support in improving the performance of future HPC architectures

机译:寻求数据驱动的架构支持,以改善未来的HPC架构的性能

获取原文
获取原文并翻译 | 示例

摘要

We propose architectures based on Data-Driven Multithreading (DDM), a hybrid control-flow/data-flow model, to address the concurrency challenges faced by future High-Performance Computing (HPC) systems. We focus on the design and implementation of an optimized hardware Thread Scheduling Unit (TSU) and its integration into a multi-core system dubbed MiDAS. The TSU is the core of the DDM model and it orchestrates the execution of multiple threads on sequential processors based on data availability. MIDAS was prototyped on a Xilinx Virtex-6 FPGA and extensively evaluated using several micro benchmarks, showing that it achieves linearly-growing performance as the processing core count increases even when running benchmarks comprising very small problem sizes. Under the largest problem size tested and with all 8 available cores being utilized, MIDAS achieves an average speedup of 7.91x, exhibiting 98.8% utilization efficiency. Further, several results pertaining to the proposed hardware TSU are provided, including FPGA real estate requirements, where it is found that MiDAS's TSU demands relatively small overheads and reduced power consumption, while various TSU operations adhere to low latency responses. To back said claims, the proposed DDM-based TSU is compared with the Task Superscalar architecture that implements the StarSs programming framework in hardware. As such, comparison results show that the proposed TSU requires much less of both hardware investment and energy consumption to operate. Specifically, Task Superscalar is found to be 4.94 x larger than the DDM-supporting TSU in terms of slice register requirements and 11.34 x larger with respect to the slice look-up table count. Last, the hardware TSU is compared with a software TSU implementation offering identical functionalities, with both being run on an FPGA fabric under a synthetic application, where a detailed performance evaluation shows that MiDAS's hardware-implemented TSU significantly outperforms its software-based TSU counterpart. (C) 2019 Elsevier B.V. All rights reserved.
机译:我们提出了基于数据驱动多线程(DDM)(一种混合控制流/数据流模型)的体系结构,以解决未来高性能计算(HPC)系统面临的并发挑战。我们专注于优化硬件线程调度单元(TSU)的设计和实现,并将其集成到称为MiDAS的多核系统中。 TSU是DDM模型的核心,它根据数据可用性在顺序处理器上协调多个线程的执行。 MIDAS是在Xilinx Virtex-6 FPGA上原型化的,并使用多个微基准进行了广泛评估,显示出即使在运行包含非常小的问题尺寸的基准时,随着处理核数量的增加,它也实现了线性增长的性能。在测试的最大问题规模下,利用所有8个可用内核,MIDAS的平均速度达到了7.91倍,显示出98.8%的利用率。此外,提供了与拟议的硬件TSU有关的一些结果,包括FPGA房地产要求,发现MiDAS的TSU要求相对较小的开销和降低的功耗,而各种TSU操作坚持低延迟响应。为了支持上述要求,将建议的基于DDM的TSU与Task Superscalar架构进行了比较,该架构在硬件上实现了StarSs编程框架。因此,比较结果表明,拟议的TSU所需的硬件投资和能耗都少得多。具体而言,发现任务超级标量在切片寄存器要求方面比支持DDM的TSU大4.94倍,在切片查找表计数方面比11.64倍大。最后,将硬件TSU与提供相同功能的软件TSU实现方案进行了比较,两者均在合成应用程序下的FPGA架构上运行,详细的性能评估表明,MiDAS的硬件实现的TSU明显优于其基于软件的TSU。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号