【24h】

Copy detection systems for digital documents

机译:数字文件复印检测系统

获取原文

摘要

Partial or total duplication of document content is common to large digital libraries. We present a copy detection system to automate the detection of application in digital documents. The system we present is sentence-based and makes three contributions: it proposes an intuitive definition of similarity between documents; it produces the distribution of overlap that exists between overlapping documents; it is resistant to inaccuracy due to large variations in document size. We report the results of several experiments that illustrate the behavior and functionality of the system.
机译:大型数字图书馆普遍存在部分或全部文件内容重复的现象。我们提出了一种复制检测系统,可以自动检测数字文档中的应用程序。我们提出的系统是基于句子的,并做出了三点贡献:它提出了文档之间相似性的直观定义;它产生重叠文档之间存在的重叠分布;由于文档大小的巨大差异,它可以防止出现不准确的情况。我们报告了一些实验的结果,这些实验说明了系统的行为和功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号