首页> 外文会议>IEEE International Conference on Networking, Architecture and Storage >Efficient parallel packet processing using a shared memory many-core processor with hardware support to accelerate communication
【24h】

Efficient parallel packet processing using a shared memory many-core processor with hardware support to accelerate communication

机译:使用共享内存的多核处理器进行有效的并行数据包处理,并具有硬件支持以加速通信

获取原文

摘要

Software IP forwarding routers provide flexibility, programmability and extensibility, while enabling fast deployment. The key question is whether they can keep up with the efficiency of special purpose hardware counterparts. Shared memory stands out as sine qua non for parallel programming of many commercial multicore processors, so it is the paradigm of choice to implement software routers. For efficiency, shared memory is often implemented with hardware support for cache coherence and data consistency among the cores. Although it enables efficient data access in many common case scenarios, the communication between cores using shared memory synchronization primitives often limits scalability. In this paper we perform a thorough characterization of a multithreaded packet processing application to quantify the opportunities from exploiting concurrency, as well as identify scalability bottlenecks in futuristic shared memory multicores. We propose to retain the shared memory model, however, introduce a set of lightweight in-hardware explicit messaging send/receive instructions in the instruction set architecture (ISA). These instructions are used to mitigate the overheads of multi-party communication in shared memory protocols. Using simulations of a 64 core multicore, we identify that scalability of parallel packet processing is limited due to packet ordering requirement that leads to expensive implicit communication under shared memory. Using explicit messaging support in the ISA, the communication bottleneck is mitigated, and the application scales to 30× at 64 cores.
机译:软件IP转发路由器可提供灵活性,可编程性和可扩展性,同时还能实现快速部署。关键问题是它们是否能跟上专用硬件同行的效率。共享内存是许多商用多核处理器并行编程的必要条件,因此,实现软件路由器是选择的范例。为了提高效率,共享内存通常通过硬件支持实现,以支持内核之间的缓存一致性和数据一致性。尽管它在许多常见情况下都可以进行有效的数据访问,但是使用共享内存同步原语的内核之间的通信通常会限制可伸缩性。在本文中,我们对多线程数据包处理应用程序进行了全面的表征,以量化利用并发的机会,并确定未来派共享内存多核中的可伸缩性瓶颈。我们建议保留共享内存模型,但是,在指令集体系结构(ISA)中引入了一组轻量级的硬件内显式消息发送/接收指令。这些指令用于减轻共享内存协议中多方通信的开销。通过使用64核多核的仿真,我们发现并行分组处理的可伸缩性由于分组顺序的要求而受到限制,这导致在共享内存下导致昂贵的隐式通信。使用ISA中的显式消息支持,可以缓解通信瓶颈,并且在64个内核上,应用程序可以扩展到30倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号