首页> 外文期刊>Information Systems >Metric Index: An efficient and scalable solution for precise and approximate similarity search
【24h】

Metric Index: An efficient and scalable solution for precise and approximate similarity search

机译:指标索引:高效,可扩展的解决方案,用于精确和近似的相似度搜索

获取原文
获取原文并翻译 | 示例
       

摘要

Metric space is a universal and versatile model of similarity that can be applied in various areas of information retrieval. However, a general, efficient, and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called Metric Index (M-Index) that employs practically all known principles of metric space partitioning, pruning, and filtering, thus reaching high search performance while having constant building costs per object. The heart of the M-lndex is a general mapping mechanism that enables to actually store the data in established structures such as the B~+-tree or even in a distributed storage. We implemented the M-lndex with the B~+-tree and performed experiments on two datasets-the first is an artificial set of vectors and the other is a real-life dataset composed of a combination of five MPEG-7 visual descriptors extracted from a database of up to several million digital images. The experiments put several M-lndex variants under test and compare them with established techniques for both precise and approximate similarity search. The trials show that the M-lndex outperforms the others in terms of efficiency of search-space pruning, I/O costs, and response times for precise similarity queries. Further, the M-lndex demonstrates excellent ability to keep similar data close in the index which makes its approximation algorithm very efficient-maintaining practically constant response times while preserving a very high recall as the dataset grows and even beating approaches designed purely for approximate search.
机译:度量空间是一种通用且通用的相似性模型,可以应用于信息检索的各个领域。但是,用于度量数据管理的通用,高效且可扩展的解决方案仍然是一项严峻的研究挑战。我们介绍了一种称为Metric Index(M-Index)的新颖索引和搜索机制,该机制几乎采用了度量空间划分,修剪和过滤的所有已知原理,从而在具有稳定的每对象成本的同时达到了很高的搜索性能。 M-Index的核心是一种通用的映射机制,它能够将数据实际存储在已建立的结构(例如B +树)中,甚至存储在分布式存储中。我们使用B〜+树实现了M-Index,并在两个数据集上进行了实验-第一个是一组人工向量,另一个是由5个MPEG-7视觉描述符组合而成的真实数据集一个多达数百万个数字图像的数据库。实验对几种M-Index变体进行了测试,并将其与用于精确和近似相似性搜索的既定技术进行比较。试验表明,M-Index在搜索空间修剪效率,I / O成本和精确相似性查询的响应时间方面优于其他同类产品。此外,M-Index表现出出色的能力,可以将相似的数据保持在索引中,这使其近似算法非常有效,可以保持几乎恒定的响应时间,同时随着数据集的增长保持很高的查全率,甚至击败了纯粹为近似搜索而设计的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号