...
【24h】

Plagiarism Detection Using Stopword n-grams

机译:使用停用词n-gram进行抄袭检测

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper a novel method for detecting plagiarized passages in document collections is presented. In contrast to previous work in this field that uses content terms to represent documents, the proposed method is based on a small list of stopwords (i.e., very frequent words). We show that stopword n-grams reveal important information for plagiarism detection since they are able to capture syntactic similarities between suspicious and original documents and they can be used to detect the exact plagiarized passage boundaries. Experimental results on a publicly available corpus demonstrate that the performance of the proposed approach is competitive when compared with the best reported results. More importantly, it achieves significantly better results when dealing with difficult plagiarism cases where the plagiarized passages are highly modified and most of the words or phrases have been replaced with synonyms.
机译:本文提出了一种新的方法来检测文档集中的document窃段落。与该领域以前使用内容术语表示文档的工作相反,所提出的方法基于一小部分停用词(即非常频繁的词)。我们显示停用词n-gram可以揭示important窃检测的重要信息,因为它们能够捕获可疑文档与原始文档之间的句法相似性,并且可以用于检测确切的passage窃段落边界。在公开语料库上的实验结果表明,与报告的最佳结果相比,该方法的性能具有竞争力。更重要的是,在处理difficult窃段落经过高度修饰且大多数单词或短语已被同义词替换的困难difficult窃案件时,它可以取得更好的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号