【24h】

Intrinsic Plagiarism Detection

机译:内在Pla窃检测

获取原文
获取原文并翻译 | 示例

摘要

Current research in the field of automatic plagiarism detection for text documents focuses on algorithms that compare plagiarized documents against potential original documents. Though these approaches perform well in identifying copied or even modified passages, they assume a closed world: a reference collection must be given against which a plagiarized document can be compared. This raises the question whether plagiarized passages within a document can be detected automatically if no reference is given, e. g. if the plagiarized passages stem from a book that is not available in digital form. We call this problem class intrinsic plagiarism detection. The paper is devoted to this problem class; it shows that it is possible to identify potentially plagiarized passages by analyzing a single document with respect to variations in writing style. Our contributions are fourfold: (ⅰ) a taxonomy of plagiarism delicts along with detection methods, (ⅱ) new features for the quantification of style aspects, (ⅲ) a publicly available plagiarism corpus for benchmark comparisons, and (ⅳ) promising results in non-trivial plagiarism detection settings: in our experiments we achieved recall values of 85% with a precision of 75% and better.
机译:文本文档自动抄袭检测领域的当前研究集中在将抄袭文档与潜在原始文档进行比较的算法上。尽管这些方法在识别抄袭的或经修改的段落方面表现良好,但它们却是封闭的世界:必须提供参考文献集,才能与窃的文档进行比较。这就提出了一个问题,如果没有给出参考,例如文件中的窃段落是否可以被自动检测到。 G。如果the窃的段落是从没有以数字形式提供的书中提取的。我们称此问题类别为内在窃检测。该论文专门针对这一类问题。它表明可以通过分析单个文档的写作风格变化来识别潜在抄袭的段落。我们的贡献有四方面:(ⅰ)pla窃行为的分类学以及检测方法;(ⅱ)量化样式方面的新功能;(ⅲ)可公开获得的窃语料库用于基准比较;以及(ⅳ)在非-普通抄袭检测设置:在我们的实验中,我们的召回率达到了85%,准确度达到了75%,甚至更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号