首页> 外文会议>Symposium on High Performance Computing Systems >A Fast Similarity Search kNN for Textual Datasets
【24h】

A Fast Similarity Search kNN for Textual Datasets

机译:文本数据集的快速相似性搜索knn

获取原文

摘要

The k nearest neighbors (kNN) is an algorithm for finding the closest k points in metric spaces. Due to its high computational costs, many parallel solutions have been proposed, including some implementations targeted at modern accelerators. However, most approaches assume relatively low dimensionality and dense data. Such conditions do not apply to textual datasets, which are known for their high dimensionality and sparsity. This work presents a fine-grained parallel algorithm that applies filtering technique based on most common important terms of the query document using an inverted index and its implementation on GPU. Our method improves the top k nearest neighbors search in textual datasets by up to 37× with a single GPU.
机译:K最近邻居(KNN)是用于查找度量空间中最接近的K点的算法。由于其高计算成本,提出了许多平行解决方案,包括针对现代加速器的一些实现。然而,大多数方法采用相对低的维度和密集数据。这种条件不适用于文本数据集,这些数据集以其高维度和稀疏而闻名。这项工作提出了一种细粒度并行算法,该算法应用了使用反转索引的查询文档的大多数常见重要条款的过滤技术及其在GPU上实现。我们的方法通过单个GPU改进了Top K最近邻居在文本数据集中搜索多达37次。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号