Efficient parallel packet processing using a shared memory many-core processor with hardware support to accelerate communication

机译：使用共享内存的多核处理器进行有效的并行数据包处理，并具有硬件支持以加速通信

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Software IP forwarding routers provide flexibility, programmability and extensibility, while enabling fast deployment. The key question is whether they can keep up with the efficiency of special purpose hardware counterparts. Shared memory stands out as sine qua non for parallel programming of many commercial multicore processors, so it is the paradigm of choice to implement software routers. For efficiency, shared memory is often implemented with hardware support for cache coherence and data consistency among the cores. Although it enables efficient data access in many common case scenarios, the communication between cores using shared memory synchronization primitives often limits scalability. In this paper we perform a thorough characterization of a multithreaded packet processing application to quantify the opportunities from exploiting concurrency, as well as identify scalability bottlenecks in futuristic shared memory multicores. We propose to retain the shared memory model, however, introduce a set of lightweight in-hardware explicit messaging send/receive instructions in the instruction set architecture (ISA). These instructions are used to mitigate the overheads of multi-party communication in shared memory protocols. Using simulations of a 64 core multicore, we identify that scalability of parallel packet processing is limited due to packet ordering requirement that leads to expensive implicit communication under shared memory. Using explicit messaging support in the ISA, the communication bottleneck is mitigated, and the application scales to 30× at 64 cores.

机译：软件IP转发路由器可提供灵活性，可编程性和可扩展性，同时还能实现快速部署。关键问题是它们是否能跟上专用硬件同行的效率。共享内存是许多商用多核处理器并行编程的必要条件，因此，实现软件路由器是选择的范例。为了提高效率，共享内存通常通过硬件支持实现，以支持内核之间的缓存一致性和数据一致性。尽管它在许多常见情况下都可以进行有效的数据访问，但是使用共享内存同步原语的内核之间的通信通常会限制可伸缩性。在本文中，我们对多线程数据包处理应用程序进行了全面的表征，以量化利用并发的机会，并确定未来派共享内存多核中的可伸缩性瓶颈。我们建议保留共享内存模型，但是，在指令集体系结构（ISA）中引入了一组轻量级的硬件内显式消息发送/接收指令。这些指令用于减轻共享内存协议中多方通信的开销。通过使用64核多核的仿真，我们发现并行分组处理的可伸缩性由于分组顺序的要求而受到限制，这导致在共享内存下导致昂贵的隐式通信。使用ISA中的显式消息支持，可以缓解通信瓶颈，并且在64个内核上，应用程序可以扩展到30倍。

著录项

来源
《IEEE International Conference on Networking, Architecture and Storage》|2015年|122-129|共8页
会议地点
作者
Hijaz Farrukh; Kahne Brian; Wilson Peter; Khan Omer;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
IP Forwarding; Message Passing; Multicores; Shared Memory; Workload Characterization;

机译：IP转发;消息传递;多核;共享内存;工作负载表征;

相似文献

外文文献
中文文献
专利

1. Scheduling directives: Accelerating shared-memory many-core processor execution [J] . Oded Green, Yitzhak Birk Parallel Computing . 2014,第2期

机译：调度指令：加速共享内存多核处理器的执行
2. Energy-Efficient Hardware-Accelerated Synchronization for Shared-L1-Memory Multiprocessor Clusters [J] . Glaser Florian, Tagliavini Giuseppe, Rossi Davide, IEEE Transactions on Parallel and Distributed Systems . 2021,第3期

机译：共享-L1-Memory多处理器集群的节能硬件加速同步
3. Source-to-Source Parallelization Compilers for Scientific Shared-Memory Multi-core and Accelerated Multiprocessing: Analysis, Pitfalls, Enhancement and Potential [J] . Hare Reem, Mosseri Idan, Levin Harel, International journal of parallel programming . 2020,第1期

机译：用于科学共享内存多核和加速多处理的源到源并行编译器：分析，陷阱，增强功能和潜力
4. Efficient parallel packet processing using a shared memory many-core processor with hardware support to accelerate communication [C] . Hijaz Farrukh, Kahne Brian, Wilson Peter, IEEE International Conference on Networking, Architecture and Storage . 2015

机译：使用共享内存的高效并行数据包处理使用硬件支持来加速通信
5. Architectural support for scalable speculative parallelization in shared-memory multiprocessors. [D] . Cintra, Marcelo Hehl. 2001

机译：对共享内存多处理器中的可伸缩投机并行化的体系结构支持。
6. A Preferential Design Approach for Energy-Efficient and Robust Implantable Neural Signal Processing Hardware [O] . Seetharam Narasimhan, Hillel J. Chiel, Swarup Bhunia -1

机译：节能耐用的可植入神经信号处理硬件的优先设计方法
7. Optimization and parallelization of B-spline based orbital evaluations in QMC on multi/many-core shared memory processors [O] . Mathuriya, Amrita, Luo, Ye, Benali, Anouar, 2016

机译：基于B样条的轨道评估的优化和并行化 QmC中的多核/多核共享内存处理器

Efficient parallel packet processing using a shared memory many-core processor with hardware support to accelerate communication

摘要

著录项

相似文献

相关主题

期刊订阅