首页> 外文会议>ACM symposium on document engineering >Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection: Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence
【24h】

Citation Pattern Matching Algorithms for Citation-based Plagiarism Detection: Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence

机译:基于引文的抄袭检测的​​引文模式匹配算法:贪婪引文平铺,引用块和最长的普通引文序列

获取原文

摘要

Plagiarism Detection Systems have been developed to locate instances of plagiarism e.g. within scientific papers. Studies have shown that the existing approaches deliver reasonable results in identifying copy&paste plagiarism, but fail to detect more sophisticated forms such as paraphrased, translated or idea plagiarism. The authors of this paper demonstrated in recent studies [4, 15] that the detection rate can be significantly improved by not only relying on text analysis, but by additionally analyzing the citations of a document. Citations are valuable language independent markers that are similar to a fingerprint. In fact, our examinations of real world cases have shown that the order of citations in a document often remains similar even if the text has been strongly paraphrased or translated in order to disguise plagiarism. This paper introduces three algorithms and discusses their suitability for the purpose of Citation-based Plagiarism Detection. Due to the numerous ways in which plagiarism can occur, these algorithms need to be versatile. They must be capable of detecting transpositions, scaling and combinations in a local and global form. The algorithms are coined Greedy Citation Tiling, Citation Chunking and Longest Common Citation Sequence. The evaluation showed that common forms of plagiarism can be detected reliably if these algorithms are combined.
机译:已经开发出抄袭检测系统以定位抄袭的情况。在科学论文中。研究表明,现有的方法提供合理的结果识别副本和粘贴抄袭,但未能检测更复杂的形式,如解释,翻译或想法抄袭。本文的作者在最近的研究中展示了[4,15],通过不仅依赖于文本分析,可以显着改善检测率,而是通过另外分析文件的引用。引用是有价值的语言独立标记,类似于指纹。事实上,我们对现实世界案件的考试表明,即使文本强烈涉及或翻译,文档中的引用令通常保持相似,以伪装抄袭。本文介绍了三种算法,并探讨了他们对引文的抄袭检测目的的适用性。由于可能发生剽窃的许多方式,这些算法需要多才多艺。它们必须能够以本地和全局形式检测转置,缩放和组合。该算法被创建了贪婪引文平铺,引文块和最长的共同引用序列。评价显示,如果组合这些算法,可以可靠地检测普通形式的抄袭。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号