首页> 外文期刊>IEEE/ACM Transactions on Networking >Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks
【24h】

Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks

机译:Dart:划分并专门研究基于RDMA的数据中心网络中的拥塞快速响应

获取原文
获取原文并翻译 | 示例

摘要

Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10 $imes$ ), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). Several papers have shown that even in oversubscribed datacenter networks most congestion occurs at the receiver. Accordingly, we propose a divide-and-specialize approach, called Dart, which isolates the common case of receiver congestion and further subdivides the remaining in-network congestion into the simpler spatially-localized and the harder spatially-dispersed cases. For receiver congestion, we propose direct apportioning of sending rates (DASR) in which a receiver for $n$ senders directs each sender to cut its rate by a factor of $n$ , converging in only one RTT. For the spatially-localized case, Dart provides fast (under one RTT) response by adding novel switch hardware for in-order flow deflection (IOFD) because RDMA disallows packet reordering on which previous load balancing schemes rely. For the uncommon spatially-dispersed case, Dart falls back to DCQCN. Small-scale testbed measurements and at-scale simulations, respectively, show that Dart achieves 60% (2.5 $imes$ ) and 79% (4.8 $imes$ ) lower $99{th}$ -percentile latency, and similar and 58% higher throughput than InfiniBand, and TIMELY and DCQCN.
机译:尽管与TCP相比,远程直接内存访问(RDMA)有望显着减少数据中心网络延迟(例如10 $ times $),但是在存在incast的情况下进行端到端拥塞控制仍然是一个挑战。针对拥塞问题的全部普遍性,先前的方案依赖于缓慢的迭代收敛到适当的发送速率(例如,TIMELY需要50个RTT)。几篇论文表明,即使在超额订购的数据中心网络中,大多数拥塞也发生在接收方。因此,我们提出了一种称为Dart的专门划分方法,该方法将接收器拥塞的常见情况隔离开来,并进一步将剩余的网络内拥塞细分为更简单的空间局部分布和较难的空间分散情况。对于接收方拥塞,我们建议对发送速率进行直接分配(DASR),在该方法中,$ n $个发送方的接收方指示每个发送方将其速率降低$ n $倍,从而仅聚合一个RTT。对于空间局部的情况,Dart通过添加新颖的按顺序流偏转(IOFD)的交换机硬件来提供快速(在一个RTT下)响应,因为RDMA不允许以前的负载平衡方案依赖于数据包重新排序。对于不常见的空间分散情况,Dart会退回DCQCN。小型试验台测量和大规模仿真分别表明,Dart的百分位数延迟降低了99%,达到60%(2.5 $ times $)和79%(4.8 $ times $),且相差58比InfiniBand,TIMELY和DCQCN高出%。

著录项

  • 来源
    《IEEE/ACM Transactions on Networking》 |2020年第1期|322-335|共14页
  • 作者

  • 作者单位

    Purdue Univ Dept Elect & Comp Engn W Lafayette IN 47907 USA|NVIDIA Corp Santa Clara CA 95051 USA;

    Univ Illinois Dept Comp Sci Chicago IL 60607 USA|VMware Inc Palo Alto CA 94304 USA;

    Univ Illinois Dept Comp Sci Chicago IL 60607 USA;

    Purdue Univ Dept Elect & Comp Engn W Lafayette IN 47907 USA;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Datacenters; RDMA; congestion control;

    机译:数据中心RDMA;拥塞控制;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号