Grid-Based Indexing and Search Algorithms for Large-Scale and High-Dimensional Data

机译：基于网格的索引和用于大型和高维数据的搜索算法

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The rapid development of Internet has resulted in massive information overloading recently. These information is usually represented by high-dimensional feature vectors in many related applications such as recognition, classification and retrieval. These applications usually need efficient indexing and search methods for such large-scale and high-dimensional database, which typically is a challenging task. Some efforts have been made and solved this problem to some extent. However, most of them are implemented in a single machine, which is not suitable to handle large-scale database.In this paper, we present a novel data index structure and nearest neighbor search algorithm implemented on Apache Spark. We impose a grid on the database and index data by non-empty grid cells. This grid-based index structure is simple and easy to be implemented in parallel. Moreover, we propose to build a scalable KNN graph on the grids, which increase the efficiency of this index structure by a low cost in parallel implementation. Finally, experiments are conducted in both public databases and synthetic databases, showing that the proposed methods achieve overall high performance in both efficiency and accuracy.

机译：最近互联网的快速发展导致了大量信息重载。这些信息通常由许多相关应用中的高维特征向量表示，例如识别，分类和检索。这些应用程序通常需要有效的索引和搜索方法，用于这种大规模和高维数据库，这通常是一个具有挑战性的任务。已经在某种程度上进行了一些努力并解决了这个问题。然而，大多数是在单个机器中实现的，这不适合处理大规模数据库。在本文中，我们提出了一种在Apache Spark上实现的新型数据索引结构和最近的邻居搜索算法。我们对数据库上的网格并由非空网格单元格对索引数据。基于网格的索引结构简单且易于并行实现。此外，我们建议在网格上构建可扩展的KNN图，这通过并行实现的低成本提高了该指标结构的效率。最后，实验是在公共数据库和合成数据库中进行的，表明所提出的方法以效率和准确性达到整体高性能。

著录项

来源
《International Symposium on Pervasive Systems, Algorithms and Networks》|2017年|532p|共6页
会议地点
作者
Chuanfu Yang; Zhiyang Li; Wenyu Qu; Zhaobin Liu; Heng Qi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
Nearest neighbor searches; Partitioning algorithms; Clustering algorithms; Quantization (signal); Indexing;

机译：最近的邻居搜索;分区算法;聚类算法;量化（信号）;索引;

相似文献

外文文献
中文文献
专利

1. High-Dimensional Text Datasets Clustering Algorithm Based on Cuckoo Search and Latent Semantic Indexing [J] . Saida Ishak Boushaki, Nadjet Kamel, Omar Bendjeghaba Journal of information & knowledge management . 2018,第3期

机译：基于Cuckoo搜索和潜在语义索引的高维文本数据集聚类算法
2. High-Dimensional Text Datasets Clustering Algorithm Based on Cuckoo Search and Latent Semantic Indexing [J] . Saida Ishak Boushaki, Nadjet Kamel, Omar Bendjeghaba Journal of information & knowledge management . 2018,第3期

机译：基于Cuckoo搜索和潜在语义索引的高维文本数据集聚类算法
3. GACH: a grid-based algorithm for hierarchical clustering of high-dimensional data [J] . Eghbal G. Mansoori Soft computing: A fusion of foundations, methodologies and applications . 2014,第5期

机译：GACH：一种基于网格的高维数据分层聚类算法
4. Grid-Based Indexing and Search Algorithms for Large-Scale and High-Dimensional Data [C] . Chuanfu Yang, Zhiyang Li, Wenyu Qu, 2017 14th International Symposium on Pervasive Systems, Algorithms and Networks amp; 2017 11th International Conference on Frontier of Computer Science and Technology amp; 2017 Third International Symposium of Creative Computing . 2017

机译：基于网格的大规模和高维数据索引和搜索算法
5. Search and indexing of high-dimensional feature spaces for similarity retrieval. [D] . Wu, Peng. 2001

机译：搜索和索引高维特征空间以进行相似性检索。
6. The New and Computationally Efficient MIL-SOM Algorithm: Potential Benefits for Visualization and Analysis of a Large-Scale High-Dimensional Clinically Acquired Geographic Data [O] . Tonny J. Oyana, Luke E. K. Achenie, Joon Heo 2012

机译：新型且计算有效的MIL-SOM算法：可视化和分析大规模高尺寸临床临床获得的地理数据的潜在好处
7. About JEPA Editorial Board Aim and Scope Publication Ethics Reviewer Acknowledgement Website Statistic User You are logged in as... mahfudlotulula My Profile Log Out Article Tools Print this article Indexing metadata How to cite item Finding References Journal Content Search Search Scope Browse By Issue By Author By Title Information For Readers For Authors For Librarians Information for Author Author Guidelines Online Submission Guidelines Index Google Scholar Search logo Crossref Metadata Search RESEARCHBIB Index Search BASE Metadata Search DRJI Index Search PKP Index Search PKP Index Search Onesearch Metadata Search Citeulike Index Search Citeulike Index Search CiteFactor Index Search Sinta Index Search Garuda Index Search Garuda Index Search Tools Mendeley Metadata Search logo Turnitin Metadata Search logo Zotero Metadata Search logo Keywords CPO, efisiensi teknis, teknologi, TFP Contract farming, logit, partisipasi, petani kopi Daya saing, Ekspor, Kinerja, Kopi FSCN Faktor penentu, keputusan pembelian, cabai rawit, regresi logistik. Hidroponik, Kegiatan Produksi, HOR, Manajemen Risiko Industri Kopi Niat Berwirausaha Berbasis Komoditas Pertanian, Restorasi Gambut, SEM Pengukuran Kinerja Pertanian Alami Risiko, Produksi, Musim Hujan dan Musim Kemarau, Usahatani Bawang Merah SCOR Salassae Self Help Subsidi pupuk, Pertanian Indonesia, Pengeluaran subsidi, Utang subsidi. agrowisata, krisan, SWOT, pengembangan kompetensi, kepemimpinan, motivasi, lingkungan kerja, kinerja karyawan perilaku petani, padi, organik permintaan, proyeksi, pangan hewani, Indonesia. pertanian organik, pupuk organik padat, efisiensi biaya rantai pasok Strategi Pengembangan Industri Kecil Tahu Solo di Desa Punge Blang Cut Kecamatan Meuraxa Kota Banda Aceh [O] . Muhammad Purba, Lukman Hakim, Muhammad Wardhana 2020

机译：关于JEPA编辑委员会瞄准和范围出版物伦理审稿人确认网站统计用户您已登录为... Mahfudlotulula我的个人资料注销文章工具打印本文索引元数据如何引用项目查找参考日记内容搜索范围浏览作者通过读者的标题信息，为提交人提供了作者作者作者指南在线提交指南指数谷歌学者搜索徽标CrossRef元数据搜索索引搜索基础元数据搜索DRJI索引搜索PKP索引搜索PKP索引搜索Osearch元数据搜索索引搜索Citeulike索引搜索CiteFactor索引搜索辛塔索引搜索嘉鲁达索引搜索嘉鲁达索引搜索工具Mendeley元数据搜索标志Turnitin的元数据搜索标志Zotero只元数据搜索标志关键词CPO，efisiensi teknis，TEKNOLOGI，TFP订单农业，对数，partisipasi，大年科皮大雅saing，Ekspor，Kinerja，麝香FSCN FAKTOR PENENTU ，Keputusan Pembelian，Cabai Rawit，Regresi Logistik。 Hidroponik，Kegiatan Produksi，Hor，Manajemen Risiko Industri Kopi Niat Berwirausaha Berbasis Komoditas Pertanian，Restorasi Gambut，SEM Pengukuran Kinerja Pertanian Alami Risiko，Produksi，Produksi，Musim Hujan Dan Musim Kemarau，Usahatani Bawang Merah Scor Salassae自助子女Pupuk，Pertanian Indonesia，Pengeluaran子女，Utang子女。 Agrowisata，Krisan，Swot，Pengembangan Kompetensi，Kepemimpinan，Motivasi，Lingkungan Kerja，Kinerja Karyawan Perilaku Petani，Padi，Outsikik Permintaan，Proyeksi，印度尼西亚州河湾河畔普通湾普恩岛。 Pertanian Organik，Pupuk Organik Padat，Efisiensi Biaya Rantai Pasok Strategi Pengembangan Industri Kecil Tahu Solo di Desa Purege Blang Cut Kecamatan Meuraxa Kota Banda Aceh

Grid-Based Indexing and Search Algorithms for Large-Scale and High-Dimensional Data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅