首页> 外文会议>International Conference on Field-Programmable Technology >CHIP-KNN: A Configurable and High-Performance K-Nearest Neighbors Accelerator on Cloud FPGAs
【24h】

CHIP-KNN: A Configurable and High-Performance K-Nearest Neighbors Accelerator on Cloud FPGAs

机译:CHIP-KNN:云FPGA上的可配置和高性能的K-最近邻居加速器

获取原文

摘要

The k-nearest neighbors (KNN) algorithm is an essential algorithm in many applications, such as similarity search, image classification, and database query. With the rapid growth in the dataset size and the feature dimension of each data point, processing KNN becomes more compute and memory hungry. Most prior studies focus on accelerating the computation of KNN using the abundant parallel resource on FPGAs. However, they often overlook the memory access optimizations on FPGA platforms and only achieve a marginal speedup over a multithread CPU implementation for large datasets. In this paper, we design and implement CHIP-KNN-an HLS-based, configurable, and high-performance KNN accelerator-which optimizes the off-chip memory access on cloud FPGAs with multiple DRAM or HBM (high-bandwidth memory) banks. CHIP-KNN is configurable for all essential parameters used in the algorithm, including the size of the search dataset, the feature dimension of each data point, the distance metric, and the number of nearest neighbors - K. To optimize its performance, we build an analytical performance model to explore the design space and balance the computation and memory access performance. Given a user configuration of the KNN parameters, our tool can automatically generate the optimal accelerator design on the given FPGA platform. Our experimental results on the Nimbix cloud computing platform show that: Compared to a 16-thread CPU implementation, CHIP-KNN on the Xilinx Alveo U200 FPGA board with four DRAM banks and U280 FPGA board with HBM achieves an average of 7.5x and 19.8x performance speedup, and 6.1x and 16.0x performance/dollar improvement.
机译:K-CORMATE邻居(KNN)算法是许多应用中的基本算法,例如相似性搜索,图像分类和数据库查询。随着数据集大小的快速增长和每个数据点的特征维度,处理KNN变得更加计算和饥饿。大多数事先研究专注于使用FPGA上的丰富并行资源加速KNN的计算。但是,它们经常在FPGA平台上忽略内存访问优化,并且只能在大型数据集的多线程CPU实现上实现边际加速。在本文中,我们设计和实现芯片KNN-AN基于HLS的,可配置和高性能的KNN加速器 - 在具有多个DRAM或HBM(高带宽存储器)库中,优化云FPGA上的片外存储器访问。芯片KNN可配置用于算法中使用的所有基本参数,包括搜索数据集的大小,每个数据点的特征维度,距离度量和最近邻居的数量 - K.要优化其性能,我们构建探索设计空间的分析性能模型及计算和内存访问性能。鉴于KNN参数的用户配置,我们的工具可以在给定的FPGA平台上自动生成最佳加速器设计。我们的Nimbix云计算平台上的实验结果表明:与16线CPU实现相比,Xilinx Alveo U200 FPGA板上的芯片KNN,具有四个DRAM银行和U280 FPGA板,具有HBM的平均值为7.5倍和19.8倍性能加速,6.1x和16.0x性能/美元改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号