首页> 外文会议>International conference on algorithms and architectures for parallel processing >PPS: A Low-Latency and Low-Complexity Switching Architecture Based on Packet Prefetch and Arbitration Prediction
【24h】

PPS: A Low-Latency and Low-Complexity Switching Architecture Based on Packet Prefetch and Arbitration Prediction

机译:PPS:基于数据包预取和仲裁预测的低延迟和低复杂度交换体系结构

获取原文

摘要

Interconnect networks increasingly bottleneck the performance of datacenters and HPC due to ever-increasing communication overhead. High-radix switches are widely deployed in interconnection networks to achieve higher throughput and lower latency. However, network latency could be greatly deteriorated due to traffic burst and micro-burst features. In this paper, we propose a Prefetch and prediction based Switch (PPS) which can effectively reduce the packet delay and eliminate the effect of traffic burst. By using dynamic allocation multiple queueing (DAMQ) buffer with data prefetch, PPS implements concurrent write and read with zero-delay, thus implementing full pipeline of the packet scheduling. We further propose a simple but efficient arbitration scheme, which completes a packet arbitration within one clock cycle meanwhile maintaining higher throughput. Moreover, by predicting the arbitration results and filtering the potential failed requests in the next round, our scheduling algorithm demonstrates indistinguishable performance from the iSLIP, but with nearly half of the iSLIP's area and 36.37% less logic units (LUTs). Attributing to the optimal schemes of DAMQ with control data prefetch and two-level scheduling with arbitration prediction, PPS achieves low-latency and high throughput. Also, PPS can easily extend the switching logic to a higher radix for the hardware complexity grows linearly with the number of ports.
机译:由于不断增加的通信开销,互连网络越来越成为数据中心和HPC性能的瓶颈。高基数交换机广泛部署在互连网络中,以实现更高的吞吐量和更低的延迟。但是,由于流量突发和微突发功能,网络延迟可能会大大恶化。在本文中,我们提出了一种基于预取和预测的交换机(PPS),可以有效减少数据包延迟并消除流量突发的影响。通过使用具有数据预取功能的动态分配多排队(DAMQ)缓冲区,PPS实现了零延迟的并发写入和读取,从而实现了数据包调度的完整流水线。我们进一步提出一种简单而有效的仲裁方案,该方案在一个时钟周期内完成数据包仲裁,同时保持较高的吞吐量。此外,通过预测仲裁结果并在下一轮中过滤潜在的失败请求,我们的调度算法证明了与iSLIP难以区分的性能,但是iSLIP的面积几乎减少了一半,逻辑单元(LUT)减少了36.37%。通过具有控制数据预取和具有仲裁预测的两级调度的DAMQ最佳方案,PPS实现了低延迟和高吞吐量。同样,PPS可以轻松地将交换逻辑扩展到更高的基数,因为硬件复杂性随端口数量线性增长。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号