Fast Online Set Intersection for Network Processing on FPGA

Yun R. Qu; Viktor K. Prasanna

首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Fast Online Set Intersection for Network Processing on FPGA

【24h】

Fast Online Set Intersection for Network Processing on FPGA

机译：在FPGA上进行网络处理的快速在线集交叉点

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Online set intersection operations have been widely used in network processing tasks, such as Quality of Service differentiation, firewall processing, and packet/traffic classification. The major challenge for online set intersection is to sustain line-rate processing speed; accelerating set intersection using state-of-the-art hardware devices is of great interest to the research community. In this paper, we present a novel high-performance set intersection approach on FPGA. In our approach, each element in any set is represented by a combination of Group ID (GID) and Bit Stride (BS); all the sets are intersected using linear merge techniques and bitwise AND operations. We map our online set intersection algorithm onto hardware; this is done by constructing modular Processing Element (PE) and concatenating multiple PEs into a tree-based parallel architecture. In order to improve the throughput on a state-of-the-art FPGA, we feed all the inputs to FPGA in a streaming fashion with the help of the synchronization GIDs. Post place-and-route results show that, for a typical set intersection problem in network processing, our design can intersect eight sets, each of up to 32 K elements, at a throughput of 47.4 Thousand Intersections Per Second (KIPS) and a latency of 94.8μ s per batch of inputs. Compared to the classic linear merge or bitwise AND techniques on state-of-the-art multi-core processors, our designs on FPGA achieves up to 66× throughput improvement and 80× latency reduction.

机译：在线集交叉口操作已广泛用于网络处理任务中，例如服务质量区分，防火墙处理和数据包/流量分类。在线集合路口的主要挑战是维持线速处理速度。使用最新的硬件设备来加速集合路口是研究界非常感兴趣的。在本文中，我们提出了一种在FPGA上的新型高性能集合交集方法。在我们的方法中，任何组中的每个元素都由组ID（GID）和位跨度（BS）的组合表示；使用线性合并技术和按位与运算将所有集合相交。我们将在线集合相交算法映射到硬件上；这是通过构造模块化处理元素（PE）并将多个PE连接到基于树的并行体系结构中来完成的。为了提高最新型FPGA的吞吐量，我们借助同步GID以流方式将所有输入馈送到FPGA。布局后的结果表明，对于网络处理中的典型集合相交问题，我们的设计可以相交8个集合，每个集合最多32 K元素，吞吐量为每秒47.4千个交叉点（KIPS）和延迟每批输入94.8μs。与先进的多核处理器上的经典线性合并或按位与技术相比，我们在FPGA上的设计实现了高达66倍的吞吐量提高和80倍的延迟减少。

著录项

来源
《IEEE Transactions on Parallel and Distributed Systems 》 |2016年第11期| 3214-3225| 共12页
作者
Yun R. Qu; Viktor K. Prasanna;
展开▼
作者单位

Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA;

Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Field programmable gate arrays; Memory management; Hardware; Engines; Throughput; Indexes; Data structures;

机译：现场可编程门阵列;内存管理;硬件;引擎;吞吐量;索引;数据结构;

相似文献

外文文献
中文文献
专利

1. P4 to FPGA-A Fast Approach for Generating Efficient Network Processors [J] . Cao Zhuang, Su Huayou, Yang Qianming, Quality Control, Transactions . 2020 ,第期

机译：P4到FPGA - 一种快速发电有效网络处理器的方法
2. Graph-Based Approaches to Placement of Processing Element Networks on FPGAs for Physical Model Simulation [J] . BAILEY MILLER, FRANK VAHID, TONY GIVARGIS, ACM transactions on reconfigurable technology and systems . 2015 ,第4期

机译：用于物理模型仿真的基于图的FPGA上处理元素网络的放置方法
3. Using the Minimum Set of Input Combinations to Minimize the Area of Local Routing Networks in Logic Clusters Containing Logically Equivalent I/Os in FPGAs [J] . Ye A.G. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on . 2010 ,第1期

机译：在FPGA中包含逻辑上等效的I / O的逻辑集群中，使用最小的输入组合集来最小化本地路由网络的面积
4. Faster Unbalanced Private Set Intersection [C] . Amanda C. Davi Resende, Diego F. Aranha International conference on financial cryptography and data security . 2018

机译：更快的不平衡私人路口
5. Layer-type Specialized Processing Engines for a Semi-Streaming Convolutional Neural Network Hardware Architecture for FPGAs [D] . Shaydyuk, Nazariy. 2020

机译：用于FPGA的半流式卷积神经网络硬件架构的层型专业处理引擎
6. Mapping Neural Networks to FPGA-Based IoT Devices for Ultra-Low Latency Processing [O] . Maciej Wielgosz, Michał Karwatowski 2019

机译：将神经网络映射到基于FPGA的IoT设备以进行超低延迟处理
7. Design and implementation of fast and hardware‐efficient parallel processing elements to set full and partial permutations in Beneš networks [O] . Labson Koloko, Takahiro Matsumoto, Hitoshi Obara 2021

机译：快速和硬件高效并行处理元件的设计与实现，以在Beneš网络中设置完整和部分排列

Fast Online Set Intersection for Network Processing on FPGA

摘要

著录项

相似文献

相关主题

期刊订阅