首页> 外文期刊>Information retrieval >Using word semantic concepts for plagiarism detection in text documents
【24h】

Using word semantic concepts for plagiarism detection in text documents

机译:在文本文件中使用Word语义概念进行抄袭检测

获取原文
获取原文并翻译 | 示例
           

摘要

Plagiarism is a common problem in the modern age. With the advance of Internet, it is more and more convenient to access other people's writings or publications. When someone uses the content of a text in an undesirable way, plagiarism may occur. Plagiarism infringes the intellectual property rights, so it is a serious problem nowadays. However, detecting plagiarism effectively is a challenging work. Traditional methods, like vector space model or bag-of-words, are short of providing a good solution due to the incapability of handling the semantics of words satisfactorily. In this paper, we propose a new method for plagiarism detection. We use Word2vec to transform the words into word vectors which are able to reveal the semantic relationship among different words. Through word vectors, words are clustered into concepts. Then documents and their paragraphs are represented in terms of concepts, and plagiarism detection can be done more effectively. A number of experiments are conducted to demonstrate the good performance of our proposed method.
机译:剽窃是现代时代的常见问题。随着互联网的进步,访问其他人的着作或出版物越来越方便。当有人以不受欢迎的方式使用文本的内容时,可能会发生抄袭。剽窃侵犯了知识产权,所以现在是一个严重的问题。然而,有效地检测抄袭是一个具有挑战性的工作。传统方法,如矢量空间模型或袋式,由于令人满意地处理词语的语义而无法实现良好的解决方案。在本文中,我们提出了一种新方法,用于抄袭检测方法。我们使用Word2VEC将单词转换为单词向量,能够揭示不同单词之间的语义关系。通过字向量,单词被聚集到概念中。然后在概念方面代表文件及其段落,可以更有效地完成抄袭检测。进行了许多实验以证明我们提出的方法的良好表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号