首页> 外文OA文献 >Document plagiarism detection algorithm using semantic networks
【2h】

Document plagiarism detection algorithm using semantic networks

机译:基于语义网络的文档窃检测算法

摘要

The vast increase of available documents in the World Wide Web (WWW) and the ease access to these documents has lead to a serious problem of using other’s works without giving credits. Although many methods have been developed to detect some instances of plagiarism such as changing the structure of sentences or when slightly replacing words by their synonyms, it is often hard to reveal plagiarism when the copied sentences are deliberately modified. This project proposes an algorithm for plagiarism detection over the Web using semantic networks. The corpus of this study contains 610 documents downloaded from the Web, 10 of those were selected to be the source of 20 manually plagiarized documents. The algorithm was compared to N-grams representation and the achieved results show that an appropriate semantic representation of sentences derived from WordNet’s relations outperforms N-grams with different similarity measures in detecting the plagiarized sentences. It also show that a proposed method based on extracting named entities and common nouns is ingeneral capable for retrieving the source documents from the Web using a search engine API when sentences are being moderately plagiarized.
机译:万维网(WWW)中可用文档的大量增加,以及对这些文档的便捷访问,导致了一个严重的问题,即在不给予荣誉的情况下使用他人的作品。尽管已开发出许多方法来检测某些窃事件,例如更改句子结构或在用同义词稍稍替换单词时,但刻意修改复制的句子时,往往很难揭露窃行为。该项目提出了一种使用语义网络在Web上进行窃检测的算法。该研究的语料库包含从Web下载的610个文档,其中10个被选为20个手工manually窃文档的来源。将该算法与N-gram表示进行了比较,结果表明,从WordNet的关系中得出的句子的合适语义表示在检测抄袭句子时,以不同的相似性优于N-gram。它还表明,提出的一种基于提取命名实体和普通名词的方法通常能够在适当抄袭句子时使用搜索引擎API从Web检索源文档。

著录项

  • 作者

    Ahmed Muftah Ahmed Jabr;

  • 作者单位
  • 年度 2009
  • 总页数
  • 原文格式 PDF
  • 正文语种 {"code":"en","name":"English","id":9}
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号