首页> 外文会议>International Conference on Reconfigurable Computing and FPGAs >Large-scale high-dimensional nearest neighbor search using flash memory with in-store processing
【24h】

Large-scale high-dimensional nearest neighbor search using flash memory with in-store processing

机译:使用具有店内处理功能的闪存进行大规模高维最近邻居搜索

获取原文

摘要

Modern datasets of importance such as images, videos, protein sequences or text, usually contain very high dimensional information from the search point of view. Nearest neighbor search is one of the most fundamental building blocks in dealing with large amounts of data. It is the problem of finding points in a database that are most similar to a query data point by some distance metric. There is a large body of work in algorithms for nearest-neighbor search on large highdimensional datasets. Since these algorithms invariably involve random access to data, most existing implementations ensure that the data is stored in DRAM, and does not spill into secondary storage such as hard disks. However, the immense size of modern datasets often requires hundreds of computers to accommodate the dataset in DRAM. An alternative to such a system is a much smaller cluster that stores the dataset in flash memory (instead of DRAM) and has in-store computing capability. In this paper, we build and demonstrate the performance of highdimensional nearest-neighbor search on a flash-based system with FPGA acceleration and show that it sometimes exceeds the performance of a DRAM-based solution. We chose two example applications, images and documents, for this demonstration. Since flash storage, in comparison to DRAM, is an order of magnitude cheaper and consumes an order of magnitude less power, a flashbased solution for nearest-neighbor searches offers a viable and attractive alternative.
机译:从搜索角度来看,重要的现代数据集(例如图像,视频,蛋白质序列或文本)通常包含非常高的维度信息。最近邻居搜索是处理大量数据的最基本的构建块之一。问题在于,通过某个距离度量来找到数据库中与查询数据点最相似的点。在大型高维数据集上进行最近邻搜索的算法中有大量工作。由于这些算法总是涉及对数据的随机访问,因此大多数现有的实现方式可确保将数据存储在DRAM中,并且不会溢出到诸如硬盘之类的二级存储中。但是,现代数据集的巨大规模通常需要数百台计算机才能将数据集容纳在DRAM中。这种系统的替代方案是一个较小的群集,该群集将数据集存储在闪存中(而不是DRAM)中,并且具有店内计算功能。在本文中,我们构建并演示了在具有FPGA加速功能的基于闪存的系统上进行高维最近邻搜索的性能,并表明它有时会超过基于DRAM的解决方案的性能。在本演示中,我们选择了两个示例应用程序:图像和文档。由于与DRAM相比,闪存的价格便宜了一个数量级,而功耗却减少了一个数量级,因此用于最近邻居搜索的基于闪存的解决方案提供了一种可行且有吸引力的选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号