首页> 外文会议>International Symposium on Pervasive Systems, Algorithms and Networks >Grid-Based Indexing and Search Algorithms for Large-Scale and High-Dimensional Data
【24h】

Grid-Based Indexing and Search Algorithms for Large-Scale and High-Dimensional Data

机译:基于网格的索引和用于大型和高维数据的搜索算法

获取原文
获取外文期刊封面目录资料

摘要

The rapid development of Internet has resulted in massive information overloading recently. These information is usually represented by high-dimensional feature vectors in many related applications such as recognition, classification and retrieval. These applications usually need efficient indexing and search methods for such large-scale and high-dimensional database, which typically is a challenging task. Some efforts have been made and solved this problem to some extent. However, most of them are implemented in a single machine, which is not suitable to handle large-scale database.In this paper, we present a novel data index structure and nearest neighbor search algorithm implemented on Apache Spark. We impose a grid on the database and index data by non-empty grid cells. This grid-based index structure is simple and easy to be implemented in parallel. Moreover, we propose to build a scalable KNN graph on the grids, which increase the efficiency of this index structure by a low cost in parallel implementation. Finally, experiments are conducted in both public databases and synthetic databases, showing that the proposed methods achieve overall high performance in both efficiency and accuracy.
机译:最近互联网的快速发展导致了大量信息重载。这些信息通常由许多相关应用中的高维特征向量表示,例如识别,分类和检索。这些应用程序通常需要有效的索引和搜索方法,用于这种大规模和高维数据库,这通常是一个具有挑战性的任务。已经在某种程度上进行了一些努力并解决了这个问题。然而,大多数是在单个机器中实现的,这不适合处理大规模数据库。在本文中,我们提出了一种在Apache Spark上实现的新型数据索引结构和最近的邻居搜索算法。我们对数据库上的网格并由非空网格单元格对索引数据。基于网格的索引结构简单且易于并行实现。此外,我们建议在网格上构建可扩展的KNN图,这通过并行实现的低成本提高了该指标结构的效率。最后,实验是在公共数据库和合成数据库中进行的,表明所提出的方法以效率和准确性达到整体高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号