...
首页> 外文期刊>Parallel Computing >Minimizing the usage of hardware counters for collective communication using triggered operations
【24h】

Minimizing the usage of hardware counters for collective communication using triggered operations

机译:使用触发操作最小化硬件计数器的使用

获取原文
获取原文并翻译 | 示例

摘要

Triggered operations and counting events or counters are building blocks used by communication libraries, such as MPI, to offload collective operations to the Host Fabric Interface (HFI) or Network Interface Card (NIC). Triggered operations can be used to schedule a network or arithmetic operation to occur in the future, when a trigger counter reaches a specified threshold. On completion of the operation, the value of a completion counter increases by one. With this mechanism, it is possible to create a chain of dependent operations, so that the execution of an operation is triggered when all the operations it depends on have completed its execution.Triggered operations rely on hardware counters on the HFI and are a limited resource. Thus, if the number of required counters exceeds the number of hardware counters, a collective needs to stall until a previous collective completes and counters are released. In addition, if the HFI has a counter cache, utilizing a large number of counters can cause cache thrashing and provide poor performance. Therefore, it is important to reduce the number of counters, especially when running on a large supercomputer or when an application uses non-blocking collectives and multiple collectives can run concurrently. Moreover, counters being a scarce resource, it is important for the MPI library to be able to estimate the number of counters required by a collective so that it can fallback to the software implementation when the number of available counters is less than the required number.In this paper, we propose an algorithm to optimize the number of hardware counters used when offloading collectives with triggered operations. With our algorithm, different operations can share and re-use trigger and completion counters based on the dependences among them and their topological orderings. We have also proposed models to estimate the number of counters required by different collectives when using the optimization algorithm. While the proposed counter optimization algorithm assumes that the dependences among various operations in a collective are represented using a Directed Acyclic Graph (DAG), there might be cases when no DAGs are provided for the collective. In this paper, we also discuss how we can optimize the usage of counters for such cases. Our experimental results show that our proposed algorithm significantly reduces the number of counters over a naive approach that does not consider the dependences among the operations. (C) 2020 Elsevier B.V. All rights reserved.
机译:触发的操作和计数事件或计数器是通过通信库(例如MPI)使用的构建块,以将集体操作卸载到主机结构接口(HFI)或网络接口卡(NIC)。触发操作可用于安排在触发计数器达到指定阈值的情况下在将来发生的网络或算术运算。在完成操作时,完成计数器的值增加一个。利用这种机制,可以创建一系列依赖操作,从而在所有取决于所完成的执行时触发操作的执行。依赖于HFI上的硬件计数器,并且是有限的资源。因此,如果所需计数器的数量超过硬件计数器的数量,则集体需要停止,直到释放先前的集体完成和计数器。此外,如果HFI有一个计数器缓存,则利用大量计数器可以导致缓存捶打并提供不良性能。因此,重要的是减少计数器的数量,尤其是在大型超级计算机上运行时或应用程序使用非阻塞集集团和多个集体可以同时运行时。此外,计数器是一种稀缺资源,对于MPI库来说能够估计集体所需的计数器数量是重要的,使得当可用计数器的数量小于所需的数字时,它可以回退到软件实现。在本文中,我们提出了一种算法来优化用触发操作卸载集体时使用的硬件计数器的数量。利用我们的算法,不同的操作可以根据其中的依赖和它们的拓扑排序来共享和重复使用触发器和完成计数器。我们还建议在使用优化算法时估计不同集体所需的计数器数量。虽然所提出的计数器优化算法假定使用定向的非环路图(DAG)表示集体中各种操作之间的依赖性,但是当没有为集体提供DAG时可能存在案例。在本文中,我们还讨论了我们如何优化计数器的使用情况。我们的实验结果表明,我们所提出的算法显着降低了一个朴素的方法,不考虑操作之间的依赖性。 (c)2020 Elsevier B.v.保留所有权利。

著录项

  • 来源
    《Parallel Computing》 |2020年第10期|102636.1-102636.14|共14页
  • 作者单位

    Intel Corp Santa Clara CA 95051 USA;

    Intel Corp Santa Clara CA 95051 USA;

    Intel Corp Santa Clara CA 95051 USA;

    Intel Corp Santa Clara CA 95051 USA;

    Intel Corp Santa Clara CA 95051 USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号