首页> 外文会议>Mexican International Conference on Artificial Intelligence >Towards Document Plagiarism Detection Based on the Relevance and Fragmentation of the Reused Text
【24h】

Towards Document Plagiarism Detection Based on the Relevance and Fragmentation of the Reused Text

机译:根据重复文本的相关性和碎片来探讨抄袭检测

获取原文

摘要

Traditionally, External Plagiarism Detection has been carried out by determining and measuring the similar sections between a given pair of documents, known as source and suspicious documents. One of the main difficulties of this task resides on the fact that not all similar text sections are examples of plagiarism, since thematic coincidences also tend to produce portions of common text. In order to face this problem in this paper we propose to represent the common (possibly reused) text by means of a set of features that denote its relevance and fragmentation. This new representation, used in conjunction with supervised learning algorithms, provides more elements for the automatic detection of document plagiarism; in particular, our experimental results show that it clearly outperformed the accuracy results achieved by traditional n-gram based approaches.
机译:传统上,通过确定和测量给定对文件之间的类似部分,称为源和可疑文件来进行外部抄袭检测。这项任务的主要困难之一存在于事实上,并非所有类似的文本部分都是抄袭的例子,因为专题巧合也倾向于产生共同文本的部分。为了在本文中面对这个问题,我们建议通过一组特征来代表共同的(可能重复使用)文本,该功能表示其相关性和碎片。与监督学习算法结合使用的新表示提供了更多元素用于自动检测文件抄袭;特别是,我们的实验结果表明它显然优于传统的N-GRAM基础方法实现的准确性结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号