首页> 外文学位 >Similarity search for large-scale image datasets.
【24h】

Similarity search for large-scale image datasets.

机译:相似搜索大型图像数据集。

获取原文
获取原文并翻译 | 示例

摘要

Content-based image similarity search is a difficult problem due to the high dimensionality and usually massive amount of image data. The main challenge is to achieve high-quality similarity search with high speed and low space usage. This thesis proposes several techniques to address the problem of building a similarity search system for large-scale image datasets. A prototype image search system, called CASS-Image (Content-Aware Search System for Images), has been implemented to demonstrate the effectiveness of these techniques.; The first contribution of this thesis is a sketch construction algorithm that converts high-dimensional feature vectors into bit vectors (sketches), such that the weighted (and thresholded) ℓ1 distance between two feature vectors can be estimated by the Hamming distance of their sketches. Experimental results show that using sketches can typically reduce the space requirement by an order of magnitude with minimal impact on similarity search quality.; The second is a hash-perturbation based LSH (Locality Sensitive Hashing) technique for approximate nearest neighbor search in high dimensions. This technique probes multiple buckets in each hash table by perturbing the hashed value of the query object. Performance evaluations show that this method is both time and space efficient. It has a similar time efficiency as the basic LSH method while reducing the space requirement by a factor of five. Also, its time efficiency is twice that of the point-perturbation based LSH method.; The third is a multi-feature filtering algorithm for region-based image similarity search. This method uses approximation algorithms to generate a candidate set, and then ranks the objects in the candidate set with a more sophisticated multi-feature distance measure. It works for both feature vectors and their sketches. It can also be combined with indexing techniques to further speed up the search process. Performance evaluations show that filtering is 4--13 times faster than the brute-force approach, while still maintaining good search quality.; This thesis also proposes a new region-based image similarity measure, EMD* match, which uses square-root region weights and region distance thresholding. Experimental results show that EMD* match is 27%--91% more effective than previous image similarity search techniques.
机译:基于内容的图像相似度搜索由于高维数和通常大量的图像数据而成为一个难题。主要挑战是要以高速和低空间使用率实现高质量的相似性搜索。本文提出了几种技术来解决为大型图像数据集建立相似度搜索系统的问题。一种原型图像搜索系统,称为CASS-Image(图像内容感知搜索系统),已被证明来证明这些技术的有效性。本文的第一个贡献是草图构造算法,该算法将高维特征向量转换为位向量(草图),从而可以通过两个特征向量之间的汉明距离来估计两个特征向量之间的加权(和阈值)ℓ 1距离。草图。实验结果表明,使用草图通常可以将空间需求减少一个数量级,并且对相似性搜索质量的影响最小。第二种是基于哈希扰动的LSH(局部敏感哈希)技术,用于在高维中近似最近的邻居搜索。该技术通过扰动查询对象的哈希值来探查每个哈希表中的多个存储桶。性能评估表明,该方法既节省时间又节省空间。它具有与基本LSH方法类似的时间效率,同时将空间需求减少了五倍。而且,其时间效率是基于点扰动的LSH方法的两倍。第三是用于基于区域的图像相似性搜索的多特征过滤算法。该方法使用近似算法生成候选集,然后使用更复杂的多特征距离度量对候选集中的对象进行排序。它适用于特征向量及其草图。它还可以与索引技术结合使用,以进一步加快搜索过程。性能评估表明,过滤比蛮力方法快4--13倍,同时仍保持良好的搜索质量。本文还提出了一种新的基于区域的图像相似性度量EMD * match,它使用平方根区域权重和区域距离阈值。实验结果表明,EMD *匹配比以前的图像相似性搜索技术有效27%-91%。

著录项

  • 作者

    Lv, Qin.;

  • 作者单位

    Princeton University.;

  • 授予单位 Princeton University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 124 p.
  • 总页数 124
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号