首页> 外文期刊>Parallel Computing >Toward data-driven architectural support in improving the performance of future HPC architectures
【24h】

Toward data-driven architectural support in improving the performance of future HPC architectures

机译:在提高未来HPC架构的性能方面,对数据驱动的架构支持

获取原文
获取原文并翻译 | 示例

摘要

We propose architectures based on Data-Driven Multithreading (DDM), a hybrid control-flow/data-flow model, to address the concurrency challenges faced by future High-Performance Computing (HPC) systems. We focus on the design and implementation of an optimized hardware Thread Scheduling Unit (TSU) and its integration into a multi-core system dubbed MiDAS. The TSU is the core of the DDM model and it orchestrates the execution of multiple threads on sequential processors based on data availability. MIDAS was prototyped on a Xilinx Virtex-6 FPGA and extensively evaluated using several micro benchmarks, showing that it achieves linearly-growing performance as the processing core count increases even when running benchmarks comprising very small problem sizes. Under the largest problem size tested and with all 8 available cores being utilized, MIDAS achieves an average speedup of 7.91x, exhibiting 98.8% utilization efficiency. Further, several results pertaining to the proposed hardware TSU are provided, including FPGA real estate requirements, where it is found that MiDAS's TSU demands relatively small overheads and reduced power consumption, while various TSU operations adhere to low latency responses. To back said claims, the proposed DDM-based TSU is compared with the Task Superscalar architecture that implements the StarSs programming framework in hardware. As such, comparison results show that the proposed TSU requires much less of both hardware investment and energy consumption to operate. Specifically, Task Superscalar is found to be 4.94 x larger than the DDM-supporting TSU in terms of slice register requirements and 11.34 x larger with respect to the slice look-up table count. Last, the hardware TSU is compared with a software TSU implementation offering identical functionalities, with both being run on an FPGA fabric under a synthetic application, where a detailed performance evaluation shows that MiDAS's hardware-implemented TSU significantly outperforms its software-based TSU counterpart. (C) 2019 Elsevier B.V. All rights reserved.
机译:我们基于数据驱动多线程(DDM),混合控制流/数据流模型的建筑,以解决未来高性能计算(HPC)系统所面临的并发挑战。我们专注于优化硬件调度单元(TSU)的设计和实现及其集成到Dubbed Midas的多核系统中。 TSU是DDM模型的核心,它根据数据可用性在顺序处理器上编制多线程的执行。 MIDAS在Xilinx Virtex-6 FPGA上原型化,并使用几个微基准进行了广泛的评估,表明它达到了线性增长的性能,因为即使在包括非常小的问题尺寸的基准时,也可以随着处理核心计数增加。在最大的问题规模和所有8种可用的核心中,Midas实现了7.91倍的平均速度,利用效率为98.8%。此外,提供了有关所提出的硬件TSU的几个结果,包括FPGA房地产要求,发现Midas的TSU需要相对较小的开销和降低功耗,而各种TSU操作粘附于低延迟响应。要返回说,将所提出的基于DDM的TSU与在硬件中实现星形编程框架的任务超卡架构进行比较。因此,比较结果表明,拟议的司伍需要较少的硬件投资和能源消耗来运作。具体而言,在切片寄存器要求方面,任务Superscalar比Slice寄存器要求的DDM支持TSU大为4.94 x,并且相对于切片查找表计数大11.34 x。最后,将硬件TSU与软件TSU实现提供了相同的功能,两者都在合成应用下在FPGA结构上运行,其中详细的性能评估表明MIDAS的硬件实施的TSU显着优于基于软件的TSU对应物。 (c)2019 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号