Similarity search for large-scale image datasets.

机译：相似搜索大型图像数据集。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Content-based image similarity search is a difficult problem due to the high dimensionality and usually massive amount of image data. The main challenge is to achieve high-quality similarity search with high speed and low space usage. This thesis proposes several techniques to address the problem of building a similarity search system for large-scale image datasets. A prototype image search system, called CASS-Image (Content-Aware Search System for Images), has been implemented to demonstrate the effectiveness of these techniques.; The first contribution of this thesis is a sketch construction algorithm that converts high-dimensional feature vectors into bit vectors (sketches), such that the weighted (and thresholded) ℓ1 distance between two feature vectors can be estimated by the Hamming distance of their sketches. Experimental results show that using sketches can typically reduce the space requirement by an order of magnitude with minimal impact on similarity search quality.; The second is a hash-perturbation based LSH (Locality Sensitive Hashing) technique for approximate nearest neighbor search in high dimensions. This technique probes multiple buckets in each hash table by perturbing the hashed value of the query object. Performance evaluations show that this method is both time and space efficient. It has a similar time efficiency as the basic LSH method while reducing the space requirement by a factor of five. Also, its time efficiency is twice that of the point-perturbation based LSH method.; The third is a multi-feature filtering algorithm for region-based image similarity search. This method uses approximation algorithms to generate a candidate set, and then ranks the objects in the candidate set with a more sophisticated multi-feature distance measure. It works for both feature vectors and their sketches. It can also be combined with indexing techniques to further speed up the search process. Performance evaluations show that filtering is 4--13 times faster than the brute-force approach, while still maintaining good search quality.; This thesis also proposes a new region-based image similarity measure, EMD* match, which uses square-root region weights and region distance thresholding. Experimental results show that EMD* match is 27%--91% more effective than previous image similarity search techniques.

机译：基于内容的图像相似度搜索由于高维数和通常大量的图像数据而成为一个难题。主要挑战是要以高速和低空间使用率实现高质量的相似性搜索。本文提出了几种技术来解决为大型图像数据集建立相似度搜索系统的问题。一种原型图像搜索系统，称为CASS-Image（图像内容感知搜索系统），已被证明来证明这些技术的有效性。本文的第一个贡献是草图构造算法，该算法将高维特征向量转换为位向量（草图），从而可以通过两个特征向量之间的汉明距离来估计两个特征向量之间的加权（和阈值）＆ell; 1距离。草图。实验结果表明，使用草图通常可以将空间需求减少一个数量级，并且对相似性搜索质量的影响最小。第二种是基于哈希扰动的LSH（局部敏感哈希）技术，用于在高维中近似最近的邻居搜索。该技术通过扰动查询对象的哈希值来探查每个哈希表中的多个存储桶。性能评估表明，该方法既节省时间又节省空间。它具有与基本LSH方法类似的时间效率，同时将空间需求减少了五倍。而且，其时间效率是基于点扰动的LSH方法的两倍。第三是用于基于区域的图像相似性搜索的多特征过滤算法。该方法使用近似算法生成候选集，然后使用更复杂的多特征距离度量对候选集中的对象进行排序。它适用于特征向量及其草图。它还可以与索引技术结合使用，以进一步加快搜索过程。性能评估表明，过滤比蛮力方法快4--13倍，同时仍保持良好的搜索质量。本文还提出了一种新的基于区域的图像相似性度量EMD * match，它使用平方根区域权重和区域距离阈值。实验结果表明，EMD *匹配比以前的图像相似性搜索技术有效27％-91％。

著录项

作者
Lv, Qin.;
展开▼
作者单位

Princeton University.;

展开▼
授予单位 Princeton University.;
学科 Computer Science.
学位 Ph.D.
年度 2006
页码 124 p.
总页数 124
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. The influence of image descriptors’ dimensions’ value cardinalities on large-scale similarity search [J] . Theodoros Semertzidis, Dimitrios Rafailidis, Michael Gerassimos Strintzis, International Journal of Multimedia Information Retrieval . 2015,第3期

机译：图像描述符的维数基数对大规模相似度搜索的影响
2. Large-Scale Comparison of Alternative Similarity Search Strategies with Varying Chemical Information Contents [J] . Oliver Laufk?tter, Tomoyuki Miyao, Jürgen Bajorath ACS Omega . 2019,第12期

机译：各种化学信息含量不同的替代相似性搜索策略的大规模比较
3. Large-scale parallel similarity search with Product Quantization for online multimedia services [J] . Andrade Guilherme, Fernandes Andre, Gomes Jeremias M., Journal of Parallel and Distributed Computing . 2019,第MARa期

机译：用于在线多媒体服务的带有产品量化的大规模并行相似度搜索
4. Large-scale image similarity search optimization based on multi-core architecture [C] . Jun-yi Li, Jian-hua Li International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery . 2017

机译：基于多核架构的大规模图像相似度搜索优化
5. High-Dimensional Similarity Search for Large Datasets. [D] . Dong, Wei. 2011

机译：大数据集的高维相似性搜索。
6. Large-Scale Comparison of Alternative Similarity Search Strategies with Varying ChemicalInformation Contents [O] . Oliver Laufkötter, Tomoyuki Miyao, Jürgen Bajorath, 2019

机译：各种化学物质的替代相似性搜索策略的大规模比较信息内容
7. Content-Based Similarity Search in Large-Scale DNA Data Storage Systems [O] . Callista Bee, Yuan-Jyue Chen, David Ward, 2020

机译：基于内容的相似性搜索大规模DNA数据存储系统

Similarity search for large-scale image datasets.

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅