...
首页> 外文期刊>Expert Systems with Application >Determining and characterizing the reused text for plagiarism detection
【24h】

Determining and characterizing the reused text for plagiarism detection

机译:确定和表征重复使用的文本以进行窃检测

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

An important task in plagiarism detection is determining and measuring similar text portions between a given pair of documents. One of the main difficulties of this task resides on the fact that reused text is commonly modified with the aim of covering or camouflaging the plagiarism. Another difficulty is that not all similar text fragments are examples of plagiarism, since thematic coincidences also tend to produce portions of similar text. In order to tackle these problems, we propose a novel method for detecting likely portions of reused text. This method is able to detect common actions performed by plagiarists such as word deletion, insertion and transposition, allowing to obtain plausible portions of reused text. We also propose representing the identified reused text by means of a set of features that denote its degree of plagiarism, relevance and fragmentation. This new representation aims to facilitate the recognition of plagiarism by considering diverse characteristics of the reused text during the classification phase. Experimental results employing a supervised classification strategy showed that the proposed method is able to outperform traditionally used approaches.
机译:窃检测的一项重要任务是确定和测量给定文档对之间的相似文本部分。这项任务的主要困难之一在于这样一个事实,即重复使用的文本通常会被修改以掩盖或伪装cam窃。另一个困难是,并非所有相似的文本片段都是窃的例子,因为主题的巧合也往往会产生相似文本的一部分。为了解决这些问题,我们提出了一种新颖的方法来检测重用文本的可能部分。此方法能够检测窃者执行的常见操作,例如单词删除,插入和转置,从而获得重用文本的合理部分。我们还建议通过一组功能来表示已标识的重用文本,这些功能表示其抄袭程度,相关性和碎片化程度。此新表示形式旨在通过在分类阶段考虑重用文本的各种特征来促进the窃的识别。使用监督分类策略的实验结果表明,该方法能够胜过传统方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号