...
首页> 外文期刊>ACM transactions on reconfigurable technology and systems >Benefits of Adding Hardware Support for Broadcast and Reduce Operations in MPSoC Applications
【24h】

Benefits of Adding Hardware Support for Broadcast and Reduce Operations in MPSoC Applications

机译:增加对广播的硬件支持并减少MPSoC应用程序中的操作的好处

获取原文
获取原文并翻译 | 示例
           

摘要

MPI has been used as a parallel programming model for supercomputers and clusters and recently in Multi-Processor Systems-on-Chip (MPSoC). One component of MPI is collective communication and its performance is key for certain parallel applications to achieve good speedups. Previous work showed that, with synthetic communication-only benchmarks, communication improvements of up to 11.4-fold and 22-fold for broadcast and reduce operations, respectively, can be achieved by providing hardware support at the network level in a Network-on-Chip (NoC). However, these numbers do not provide a good estimation of the advantage for actual applications, as there are other factors that affect performance besides communications, such as computation. To this end, we extend our previous work by evaluating the impact of hardware support over a set of five parallel application kernels of varying computation-to-communication ratios. By introducing some useful computation to the performance evaluation, we obtain more representative results of the benefits of adding hardware support for broadcast and reduce operations. The experiments show that applications with lower computation-to-communication ratios benefit the most from hardware support as they highly depend on efficient collective communications to achieve better scalability. We also extend our work by doing more analysis on clock frequency, resource usage, power, and energy. The results show reasonable scalability for resource utilization and power in the network interfaces as the number of channels increases and that, even though more power is dissipated in the network interfaces due to the added hardware, the total energy used can still be less if the actual speedup is sufficient. The application kernels are executed in a 24-embedded-processor system distributed across four FPGAs.
机译:MPI已被用作超级计算机和集群的并行编程模型,最近还被用作片上多处理器系统(MPSoC)。 MPI的一个组件是集体通信,它的性能对于某些并行应用程序实现良好的加速至关重要。以前的工作表明,使用仅通信的综合基准,可以通过在片上网络中提供网络级别的硬件支持,分别将广播和减少操作的通信效率分别提高11.4倍和22倍。 (NoC)。但是,这些数字不能很好地估计实际应用的优势,因为除了通信以外,还有其他一些影响性能的因素,例如计算。为此,我们通过评估硬件支持对一组五个计算/通信比率不同的并行应用程序内核的影响来扩展我们的先前工作。通过在性能评估中引入一些有用的计算,我们获得了增加广播支持和减少操作的好处的更具代表性的结果。实验表明,具有较低计算与通信比率的应用程序从硬件支持中受益最大,因为它们高度依赖有效的集体通信来实现更好的可伸缩性。我们还通过对时钟频率,资源使用,功率和能量进行更多分析来扩展我们的工作。结果表明,随着通道数量的增加,网络接口中的资源利用和功耗具有了合理的可扩展性,即使由于增加了硬件而在网络接口中耗散了更多的功率,但如果实际使用的总能量仍然较少加速就足够了。应用程序内核在分布于四个FPGA的24嵌入式处理器系统中执行。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号