首页> 外文会议>IEEE International Symposium on High Performance Computer Architecture >QuickNN: Memory and Performance Optimization of k-d Tree Based Nearest Neighbor Search for 3D Point Clouds
【24h】

QuickNN: Memory and Performance Optimization of k-d Tree Based Nearest Neighbor Search for 3D Point Clouds

机译:Quicknn:基于K-D树的存储器和性能优化3D点云搜索

获取原文

摘要

The use of Light Detection And Ranging (LiDAR) has enabled the continued improvement in accuracy and performance of autonomous navigation. The latest applications require LiDAR's of the highest spatial resolution, which generate a massive amount of 3D point clouds that need to be processed in real time. In this work, we investigate the architecture design for k-Nearest Neighbor (kNN) search, an important processing kernel for 3D point clouds. An approximate kNN search based on a k-dimensional (k-d) tree is employed to improve performance. However, even for today's moderate-sized problems, this approximate kNN search is severely hindered by memory bandwidth due to numerous random accesses and minimal data reuse opportunities. We apply several memory optimization schemes to alleviate the bandwidth bottleneck: 1) the k-d tree data structure is partitioned to two sets: tree nodes and point buckets, based on their distinct characteristics – tree nodes that have high reuse are cached for their lifetime to facilitate search, while point buckets with low reuse are organized in regular contiguous segments in external memory to facilitate efficient burst access; 2) write and read caches are added to gather random accesses to transform them to sequential accesses; and 3) tree construction and tree search are interleaved to cut redundant access streams. With optimized memory bandwidth, the kNN search can be further accelerated by two new processing schemes: 1) parallel tree traversal that utilizes multiple workers with minimal tree duplication overhead, and 2) incremental tree building that minimizes the overhead of tree construction by dynamically updating the tree instead of building it from scratch every time. We demonstrate the performance and memory-optimized QuickNN architecture on FPGA and perform exhaustive benchmarking, showing that up to a 19× and 7.3× speedup over k-d tree searches performed on a modern CPU and GPU, respectively, and a 14.5× speedup over a comparable sized architecture performing an exact search. Finally, we show that QuickNN achieves two orders of magnitude performance per watt increase over CPU and GPU methods.
机译:光检测和测距(LIDAR)的使用使得能够继续提高自主导航的准确性和性能。最新应用需要LIDAR的最高空间分辨率,这会产生需要实时处理的大量3D点云。在这项工作中,我们研究了K-Collest邻居(KNN)搜索的架构设计,这是3D点云的重要处理核心。基于K维(K-D)树的近似KNN搜索用于提高性能。然而,即使对于今天的中等大小问题,由于许多随机访问和最小的数据重用机会,这种近似KNN搜索也受到内存带宽严重阻碍。我们应用多个内存优化方案来缓解带宽瓶颈:1)KD树数据结构被划分为两组:树节点和点存储桶,基于它们的不同特性 - 具有高重用的树节点,缓存为其寿命以便于促进搜索,而具有低重用的点桶在外部内存中的常规连续段中组织,以便于有效的突发访问; 2)添加并读取缓存以收集随机访问以将它们转换为顺序访问; 3)树施工和树搜索是交错的,以削减冗余的访问流。通过优化的内存带宽,knn搜索可以通过两个新的处理方案进一步加速:1)并行树遍历,利用具有最小树复制开销的多个工人和2)增量树建筑,通过动态更新,最小化树施工的开销。树,而不是每次从头开始建造它。我们展示了FPGA上的性能和内存优化的Quicknn架构,并执行了详尽的基准,显示了在现代CPU和GPU上执行的KD树搜索的高达19×和7.3×加速,并通过可比较的加速14.5×加速大小架构执行精确搜索。最后,我们表明Quicknn通过CPU和GPU方法增加了每种瓦特的两个数量级性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号