【24h】

Accelerating Equi-Join on a CPU-FPGA Heterogeneous Platform

机译:在CPU-FPGA异构平台上加速Equi-Join

获取原文
获取原文并翻译 | 示例

摘要

Accelerating database applications using FPGAs has recently been an area of growing interest in both academia and industry. Equi-join is one of the key database operations whose performance highly depends on sorting, which exhibits high memory usage on FPGA. A fully pipelined N-key merge sorter consists of log N sorting stages using O(N) memory totally. For large data sets, external memory has to be employed to perform data buffering between the sorting stages. This introduces pipeline stalls as well as several iterations between FPGA and external memory, causing significant performance degradation. In this paper, we speed-up equi-join using a hybrid CPU-FPGA heterogeneous platform. To alleviate the performance impact of limited memory, we propose a merge sort based hybrid design where the first few sorting stages in the merge sort tree are replaced with "folded" bitonic sorting networks. These "folded" bitonic sorting networks operate in parallel on the FPGA. The partial results are then merged on the CPU to produce the final sorted result. Based on this hybrid sorting design, we develop two streaming join algorithms by optimizing the classic CPU-based nested-loop join and sort-merge join algorithms. On a rangeof data set sizes, our design achieves throughput improvement of 3.1x and 1.9x compared with software-only and FPGA only implementations, respectively. Our design sustains 21.6% of thepeak bandwidth, which is 3.9x utilization obtained by the state-of-the-art FPGA equi-join implementation.
机译:最近,在学术界和工业界都越来越关注使用FPGA加速数据库应用程序的领域。 Equi-join是关键数据库操作之一,其性能高度依赖于排序,这在FPGA上具有很高的内存使用率。一个完全流水线化的N键合并排序器由总共使用O(N)内存的对数N个排序阶段组成。对于大型数据集,必须使用外部存储器在排序阶段之间执行数据缓冲。这会导致流水线停顿以及FPGA与外部存储器之间的多次迭代,从而导致性能显着下降。在本文中,我们使用混合CPU-FPGA异构平台加速了等值连接。为了减轻有限内存对性能的影响,我们提出了一种基于合并排序的混合设计,其中将合并排序树中的前几个排序阶段替换为“折叠式”双音排序网络。这些“折叠式”双子分类网在FPGA上并行运行。然后将部分结果合并到CPU上以生成最终的排序结果。基于此混合排序设计,我们通过优化经典的基于CPU的嵌套循环联接和排序合并联接算法,开发了两种流联接算法。在一系列数据集大小上,我们的设计与仅软件和仅FPGA的实现分别实现了3.1倍和1.9倍的吞吐量提高。我们的设计维持峰值带宽的21.6%,这是最新的FPGA等参实现方式获得的3.9倍的利用率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号