Managing Deja Vu: Collection building for the identification of nonidentical duplicate documents

Conrad JG; Schriber CP

首页> 外文期刊>Journal of the American Society for Information Science and Technology >Managing Deja Vu: Collection building for the identification of nonidentical duplicate documents

【24h】

Managing Deja Vu: Collection building for the identification of nonidentical duplicate documents

机译：管理Deja Vu：收集建筑物以识别不相同的重复文件

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As online document collections continue to expand, both on the Web and in proprietary environments, the need for duplicate detection becomes more critical. Few users wish to retrieve search results consisting of sets of duplicate documents, whether identical duplicates or close variants. The goal of this work is to facilitate (a) investigations into the phenomenon of near duplicates and (b) algorithmic approaches to minimizing its deleterious effect on search results. Harnessing the expertise of both client-users and professional searchers, we establish principled methods to generate a test collection for identifying and handling nonidentical duplicate documents. We subsequently examine a flexible method of characterizing and comparing documents to permit the identification of near duplicates. This method has produced promising results following an extensive evaluation using a production-based test collection created by domain experts.

机译：随着在线文档收集在Web和专有环境中的不断扩展，对重复检测的需求变得越来越重要。很少有用户希望检索由重复文档集组成的搜索结果，无论是相同的重复文档还是近似的变体。这项工作的目的是促进（a）对几乎重复的现象进行调查，以及（b）尽量减少其对搜索结果的有害影响的算法方法。利用客户用户和专业搜索者的专业知识，我们建立了有原则的方法来生成测试集，以识别和处理不相同的重复文档。随后，我们研究了一种表征和比较文档的灵活方法，可以识别几乎重复的文档。在使用领域专家创建的基于生产的测试集合进行广泛评估之后，此方法已产生了令人鼓舞的结果。

著录项

来源
《Journal of the American Society for Information Science and Technology》 |2006年第7期|p. 921-932|共12页
作者
Conrad JG; Schriber CP;
展开▼
作者单位

Thomson Legal & Regulatory, Res & Dev, St Paul, MN 55123 USA;

Thomson West, Business & Informat News, St Paul, MN 55123 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类科学、科学研究;
关键词
RETRIEVAL EFFECTIVENESS; RELEVANCE JUDGMENTS;

机译：检索效率;相关性判断;

相似文献

外文文献
中文文献
专利

1. Deja vu - A study of duplicate citations in Medline [J] . Errami M, Hicks JM, Fisher W, Bioinformatics . 2008,第2期

机译：Deja vu-Medline中重复引用的研究
2. Managing leadership in university reform: Data-led decision-making, the cost of learning and deja vu? [J] . Liz Browne, Steve Rayner Education management administration & leadership . 2015,第2期

机译：在大学改革中管理领导力：以数据为主导的决策，学习成本和否决权？
3. Building a Plant Health Care Program: Deja Vu, All Over Again [J] . John Ball Tree Care Industry . 2017,第2期

机译：建立植物保健计划：Deja Vu，再次
4. DEJA-VU: Double Feature Presentation and Iterated Loss in Deep Transformer Networks [C] . Andros Tjandra, Chunxi Liu, Frank Zhang, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：DEJA-VU：深度变压器网络中的双重功能演示和迭代损耗
5. Volume I: The Composite Instrument: Pitch, Percussion, and Instrumental Transformation in Michael Colgrass's deja vu, Volume II: Symphony for Percussion Quartet and Wind Ensemble. [D] . French, Daniel E. 2016

机译：第I卷：复合乐器：Michael Colgrass的deja vu中的音高，打击乐和乐器转换，第II卷：打击乐四重奏和管弦乐合奏。
6. Deja Vu of retrograde recanalization of coronary chronic total occlusion: A tale of a journey from Japan to India [O] . Debabrata Dash 2016

机译：冠状动脉慢性完全闭塞的逆行再通的Deja Vu：从日本到印度的旅程的故事
7. Deja vu: Medieval Motifs in Modern Arab Political Life [O] . V. V. Naumkin, V. A. Kuznetsov 2019

机译：Deja Vu：现代阿拉伯政治生活中的中世纪主题

Managing Deja Vu: Collection building for the identification of nonidentical duplicate documents

摘要

著录项

相似文献

相关主题

期刊订阅