Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks

首页> 外文期刊>IEEE/ACM Transactions on Networking >Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks

【24h】

Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks

机译：Dart：划分并专门研究基于RDMA的数据中心网络中的拥塞快速响应

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Though Remote Direct Memory Access (RDMA) promises to reduce datacenter network latencies significantly compared to TCP (e.g., 10 $imes$ ), end-to-end congestion control in the presence of incasts is a challenge. Targeting the full generality of the congestion problem, previous schemes rely on slow, iterative convergence to the appropriate sending rates (e.g., TIMELY takes 50 RTTs). Several papers have shown that even in oversubscribed datacenter networks most congestion occurs at the receiver. Accordingly, we propose a divide-and-specialize approach, called Dart, which isolates the common case of receiver congestion and further subdivides the remaining in-network congestion into the simpler spatially-localized and the harder spatially-dispersed cases. For receiver congestion, we propose direct apportioning of sending rates (DASR) in which a receiver for $n$ senders directs each sender to cut its rate by a factor of $n$ , converging in only one RTT. For the spatially-localized case, Dart provides fast (under one RTT) response by adding novel switch hardware for in-order flow deflection (IOFD) because RDMA disallows packet reordering on which previous load balancing schemes rely. For the uncommon spatially-dispersed case, Dart falls back to DCQCN. Small-scale testbed measurements and at-scale simulations, respectively, show that Dart achieves 60% (2.5 $imes$ ) and 79% (4.8 $imes$ ) lower $99{th}$ -percentile latency, and similar and 58% higher throughput than InfiniBand, and TIMELY and DCQCN.

机译：尽管与TCP相比，远程直接内存访问（RDMA）有望显着减少数据中心网络延迟（例如10 $ times $），但是在存在incast的情况下进行端到端拥塞控制仍然是一个挑战。针对拥塞问题的全部普遍性，先前的方案依赖于缓慢的迭代收敛到适当的发送速率（例如，TIMELY需要50个RTT）。几篇论文表明，即使在超额订购的数据中心网络中，大多数拥塞也发生在接收方。因此，我们提出了一种称为Dart的专门划分方法，该方法将接收器拥塞的常见情况隔离开来，并进一步将剩余的网络内拥塞细分为更简单的空间局部分布和较难的空间分散情况。对于接收方拥塞，我们建议对发送速率进行直接分配（DASR），在该方法中，$ n $个发送方的接收方指示每个发送方将其速率降低$ n $倍，从而仅聚合一个RTT。对于空间局部的情况，Dart通过添加新颖的按顺序流偏转（IOFD）的交换机硬件来提供快速（在一个RTT下）响应，因为RDMA不允许以前的负载平衡方案依赖于数据包重新排序。对于不常见的空间分散情况，Dart会退回DCQCN。小型试验台测量和大规模仿真分别表明，Dart的百分位数延迟降低了99％，达到60％（2.5 $ times $）和79％（4.8 $ times $），且相差58比InfiniBand，TIMELY和DCQCN高出％。

著录项

来源
《IEEE/ACM Transactions on Networking》 |2020年第1期|322-335|共14页
作者

展开▼
作者单位

Purdue Univ Dept Elect & Comp Engn W Lafayette IN 47907 USA|NVIDIA Corp Santa Clara CA 95051 USA;

Univ Illinois Dept Comp Sci Chicago IL 60607 USA|VMware Inc Palo Alto CA 94304 USA;

Univ Illinois Dept Comp Sci Chicago IL 60607 USA;

Purdue Univ Dept Elect & Comp Engn W Lafayette IN 47907 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Datacenters; RDMA; congestion control;

机译：数据中心RDMA;拥塞控制;

相似文献

外文文献
中文文献
专利

1. Throughput optimization of TCP incast congestion control in large-scale datacenter networks [J] . Xu Lei, Xu Ke, Jiang Yong, Computer networks . 2017,第SEPa4期

机译：大型数据中心网络中TCP内嵌拥塞控制的吞吐量优化
2. Congestion-aware adaptive forwarding in datacenter networks [J] . Zhang Jiao, Ren Fengyuan, Huang Tao, Computer Communications . 2015,第maya15期

机译：数据中心网络中的拥塞感知自适应转发
3. Seasonal feeding specialization on snails by River Darters (Percina shumardi) with a review of snail feeding by other Darter species [J] . Haag WR, Warren ML Copeia . 2006,第4期

机译：River Darters（Percina shumardi）对蜗牛的季节性饲喂专业研究，并回顾了其他Darter物种对蜗牛的饲喂
4. Pulser: Fast Congestion Response Using Explicit Incast Notifications for Datacenter Networks [C] . Hamidreza Almasi, Hamed Rezaei, Muhammad Usama Chaudhry, IEEE International Symposium on Local and Metropolitan Area Networks . 2019

机译：Pulser：针对数据中心网络的使用明确的播种通知的快速拥塞响应
5. Techniques for Memory-efficiency, Low-latency and High-throughput in RDMA-based Datacenter Networks and Applications [D] . Xue, Jiachen. 2017

机译：基于RDMA的数据中心网络和应用中的内存效率，低延迟和高吞吐量技术
6. SDTCP: Towards Datacenter TCP Congestion Control with SDN for IoT Applications [O] . Yifei Lu, Zhen Ling, Shuhong Zhu, 2017

机译：SDTCP：使用SDN实现IoT数据中心TCP拥塞控制
7. Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks [O] . Jiachen Xue, Muhammad Usama Chaudhry, Balajee Vamanan, 2020

机译：DART：除以基于RDMA的数据中心网络中拥塞的快速响应和专注
8. FastLane: An Agile Congestion Signaling Mechanism for Improving Datacenter Performance. [R] . Zats, D., Iyer, A. P., Katz, R. H., 2013

机译：FastLane：一种提高数据中心性能的敏捷拥塞信令机制。

Dart: Divide and Specialize for Fast Response to Congestion in RDMA-Based Datacenter Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅