Compact Features for Detection of Near-Duplicates in Distributed Retrieval

机译：用于检测分布式检索中近复制的紧凑功能

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In distributed information retrieval, answers from separate collections are combined into a single result set. However, the collections may overlap. The fact that the collections are distributed means that it is not in general feasible to prune duplicate and near-duplicate documents at index time. In this paper we introduce and analyze the grainy hash vector, a compact document representation that can be used to efficiently prune duplicate and near-duplicate documents from result lists. We demonstrate that, for a modest bandwidth and computational cost, many near-duplicates can be accurately removed from result lists produced by a cooperative distributed information retrieval system.

机译：在分布式信息检索中，单独集合的答案组合成单个结果集。但是，集合可能重叠。该集合是分布式的事实意味着在索引时间上修剪重复和近重复文档并不是一般的可行性。在本文中，我们介绍和分析了颗粒状哈希向量，这是一个紧凑的文件表示，可用于有效地从结果列表中进行重复和近重复文档。我们证明，对于适度的带宽和计算成本，可以从协同分布式信息检索系统产生的结果列表中精确地移除许多近副本。

著录项

来源
《International Conference on String Processing and Information Retrieval》|2006年||共12页
会议地点
作者
Yaniv Bernstein; Milad Shokouhi; Justin Zobel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类数据备份与恢复;
关键词

相似文献

外文文献
中文文献
专利

1. IR Feature Embedded BOF Indexing Method for Near-Duplicate Video Retrieval [J] . Liao Kaiyang, Lei Hao, Zheng Yuanlin, IEEE Transactions on Circuits and Systems for Video Technology . 2019,第12期

机译：IR特征嵌入式BOF索引方法，用于近复制视频检索
2. Feature selection and comparison for near-duplicate video retrieval system [J] . Byun Sung-Woo, Son Heui-Su, Lee Seok-Pil, Basic & clinical pharmacology & toxicology. . 2019,第S7期

机译：近重复视频检索系统的功能选择和比较
3. Effective Multiple Feature Hashing for Large-Scale Near-Duplicate Video Retrieval [J] . Song J., Yang Y., Huang Z., IEEE transactions on multimedia . 2013,第8期

机译：大规模近乎重复的视频检索的有效多特征散列
4. Compact Features for Detection of Near-Duplicates in Distributed Retrieval [C] . Yaniv Bernstein, Milad Shokouhi, Justin Zobel String Processing and Information Retrieval; Lecture Notes in Computer Science; 4209 . 2006

机译：用于分布式检索中近重复项检测的紧凑功能
5. Database selection in distributed information retrieval: A study of multi-collection information retrieval. [D] . Powell, Allison Lane. 2001

机译：分布式信息检索中的数据库选择：多馆藏信息检索的研究。
6. Large Scale Near-Duplicate Celebrity Web Images Retrieval Using Visual and Textual Features [O] . Fengcai Qiao, Cheng Wang, Xin Zhang, 2013

机译：使用视觉和文字功能进行大规模近乎重复的名人Web图像检索
7. Stochastic Non-linear Hashing for Near-Duplicate Video Retrieval using Deep Feature applicable to Large-scale Datasets [O] . 2019

机译：用于近复制视频检索的随机非线性散列使用适用于大型数据集的深度特征

Compact Features for Detection of Near-Duplicates in Distributed Retrieval

摘要

著录项

相似文献

相关主题

期刊订阅