Metric Index: An efficient and scalable solution for precise and approximate similarity search

David Novak; Michal Batko; Pavel Zezula

首页> 外文期刊>Information Systems >Metric Index: An efficient and scalable solution for precise and approximate similarity search

【24h】

Metric Index: An efficient and scalable solution for precise and approximate similarity search

机译：指标索引：高效，可扩展的解决方案，用于精确和近似的相似度搜索

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Metric space is a universal and versatile model of similarity that can be applied in various areas of information retrieval. However, a general, efficient, and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index) that employs practically all known principles of metric space partitioning, pruning, and filtering, thus reaching high search performance while having constant building costs per object. The heart of the M-lndex is a general mapping mechanism that enables to actually store the data in established structures such as the B~+-tree or even in a distributed storage. We implemented the M-lndex with the B~+-tree and performed experiments on two datasets-the first is an artificial set of vectors and the other is a real-life dataset composed of a combination of five MPEG-7 visual descriptors extracted from a database of up to several million digital images. The experiments put several M-lndex variants under test and compare them with established techniques for both precise and approximate similarity search. The trials show that the M-lndex outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Further, the M-lndex demonstrates excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient-maintaining practically constant response times while preserving a very high recall as the dataset grows and even beating approaches designed purely for approximate search.

机译：度量空间是一种通用且通用的相似性模型，可以应用于信息检索的各个领域。但是，用于度量数据管理的通用，高效且可扩展的解决方案仍然是一项严峻的研究挑战。我们介绍了一种称为Metric Index（M-Index）的新颖索引和搜索机制，该机制几乎采用了度量空间划分，修剪和过滤的所有已知原理，从而在具有稳定的每对象成本的同时达到了很高的搜索性能。 M-Index的核心是一种通用的映射机制，它能够将数据实际存储在已建立的结构（例如B +树）中，甚至存储在分布式存储中。我们使用B〜+树实现了M-Index，并在两个数据集上进行了实验-第一个是一组人工向量，另一个是由5个MPEG-7视觉描述符组合而成的真实数据集一个多达数百万个数字图像的数据库。实验对几种M-Index变体进行了测试，并将其与用于精确和近似相似性搜索的既定技术进行比较。试验表明，M-Index在搜索空间修剪效率，I / O成本和精确相似性查询的响应时间方面优于其他同类产品。此外，M-Index表现出出色的能力，可以将相似的数据保持在索引中，这使其近似算法非常有效，可以保持几乎恒定的响应时间，同时随着数据集的增长保持很高的查全率，甚至击败了纯粹为近似搜索而设计的方法。

著录项

来源
《Information Systems》 |2011年第4期|p.721-733|共13页
作者
David Novak; Michal Batko; Pavel Zezula;
展开▼
作者单位

Masaryk University, Brno, Czech Republic;

Masaryk University, Brno, Czech Republic;

Masaryk University, Brno, Czech Republic;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
metric space; similarity search; data structure; approximation; scalability;

机译：公制空间相似度搜索数据结构;近似;可扩展性;

相似文献

外文文献
中文文献
专利

1. Use of permutation prefixes for efficient and scalable approximate similarity search [J] . Andrea Esuli Information Processing & Management . 2012,第5期

机译：使用置换前缀进行有效和可扩展的近似相似性搜索
2. Efficient Metric Indexing for Similarity Search and Similarity Joins [J] . Lu Chen, Yunjun Gao, Xinhan Li, IEEE Transactions on Knowledge and Data Engineering . 2017,第3期

机译：用于相似搜索和相似联接的有效度量索引
3. Efficient Metric Indexing for Similarity Search and Similarity Joins [J] . Lu Chen, Yunjun Gao, Xinhan Li, Theoretical and Experimental Plant Physiology . 2017,第3期

机译：相似性搜索和相似性的高效度量索引
4. On Semantic Solutions for Efficient Approximate Similarity Search on Large-Scale Datasets [C] . Alexander Ocsa, Jose Luis Huillca, Cristian Lopez del Alamo Iberoamerican congress on pattern recognition . 2018

机译：大规模数据集上有效近似相似搜索的语义解决方案
5. A Search for Protein Function Similarity using Protein Structure Alignment Quality Metrics [D] . Johnson, Stephen Bernard 2014

机译：使用蛋白质结构比对质量度量标准的蛋白质功能相似性搜索
6. Approximate weakly efficient solutions of set-valued vector equilibrium problems [O] . Jian Chen, Yihong Xu, Ke Zhang -1

机译：集值向量平衡问题的近似弱有效解
7. PP-Index: using permutation prefixes for efficient and scalable approximate similarity search [O] . Esuli Andrea 2009

机译：PP-Index：使用置换前缀进行有效和可扩展的近似相似性搜索

Metric Index: An efficient and scalable solution for precise and approximate similarity search

摘要

著录项

相似文献

相关主题

期刊订阅