...
首页> 外文期刊>Journal of Parallel and Distributed Computing >Efficient low-latency packet processing using On-GPU Thread-Data Remapping
【24h】

Efficient low-latency packet processing using On-GPU Thread-Data Remapping

机译:使用On-GPU线程数据重新映射的高效低延迟数据包处理

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Graphics processing units are widely-used for packet processing acceleration in both physical and virtual networks. However, real-life packets come in highly-divergent sizes, causing severe GPU control flow divergence. Previous solutions rely on CPU preprocessing to reduce divergence, but it forbids the more efficient NIC-GPU packet streaming as packet batches have to stop completely at host machine. To fully utilize both GPU and PCIe resources, we propose Blink as a GPU modular software router. Instead of CPU pre-processing, the Blink router uses On-GPU Thread-Data Remapping to reduce divergence, and our novel Cross-Iteration Thread Event Signaling mechanism filters unnecessary inter-thread synchronization, doubling the performance gain achieved by traditional solution. Serving as a TCP/IP router with Deep Packet Inspection (DPI) firewall, Blink can sustain processing throughput of 31.5 GBit/s over a PCIe bandwidth of 32 GBit/s. Given a certain bandwidth, Blink reduces processing latency at least by half compared with other works.
机译:图形处理单元广泛用于物理和虚拟网络中的数据包处理加速。但是,实际数据包的大小差异很大,从而导致严重的GPU控制流差异。以前的解决方案依靠CPU预处理来减少差异,但是由于数据包批处理必须在主机上完全停止,因此它禁止更高效的NIC-GPU数据包流传输。为了充分利用GPU和PCIe资源,我们建议将Blink用作GPU模块化软件路由器。 Blink路由器使用CPU上的线程数据重映射来减少差异,而不是进行CPU预处理,而我们新颖的交叉迭代线程事件信令机制可以过滤不必要的线程间同步,从而使传统解决方案所获得的性能提高一倍。作为具有深度数据包检测(DPI)防火墙的TCP / IP路由器,Blink可以在32 GBit / s的PCIe带宽上维持31.5 GBit / s的处理吞吐量。给定一定的带宽,与其他工作相比,Blink可以将处理延迟至少减少一半。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号