...
首页> 外文期刊>International Journal of Pattern Recognition and Artificial Intelligence >Evaluation of State-of-the-Art Paraphrase Identification and Its Application to Automatic Plagiarism Detection
【24h】

Evaluation of State-of-the-Art Paraphrase Identification and Its Application to Automatic Plagiarism Detection

机译:评估最先进的解释鉴定及其在自动抄袭检测中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

Paraphrase identification is a natural language processing (NLP) problem that involves the determination of whether two text segments have the same meaning. Various NLP applications rely on a solution to this problem, including automatic plagiarism detection, text summarization, machine translation (MT), and question answering. The methods for identifying paraphrases found in the literature fall into two main classes: similarity-based methods and classification methods. This paper presents a critical study and an evaluation of existing methods for paraphrase identification and its application to automatic plagiarism detection. It presents the classes of paraphrase phenomena, the main methods, and the sets of features used by each particular method. All the methods and features used are discussed and enumerated in a table for easy comparison. Their performances on benchmark corpora are also discussed and compared via tables. Automatic plagiarism detection is presented as an application of paraphrase identification. The performances on benchmark corpora of existing plagiarism detection systems able to detect paraphrases are compared and discussed. The main outcome of this study is the identification of word overlap, structural representations, and MT measures as feature subsets that lead to the best performance results for support vector machines in both paraphrase identification and plagiarism detection on corpora. The performance results achieved by deep learning techniques highlight that these techniques are the most promising research direction in this field.
机译:解释识别是一种自然语言处理(NLP)问题,涉及确定两个文本段是否具有相同的含义。各种NLP应用程序依赖于解决此问题的解决方案,包括自动抄袭检测,文本摘要,机器翻译(MT)和问题应答。识别文献中发现的释义的方法属于两个主要类:基于相似性的方法和分类方法。本文提出了对现有方法进行了关键研究,并评估了对自动抄袭检测的​​现有方法。它介绍了每种特定方法使用的释义现象,主要方法和特征集的类。使用的所有方法和功能都在表中讨论和枚举,以便于比较。他们在基准语料库上的表演也通过表进行了讨论和比较。自动抄袭检测作为解释鉴定的应用。比较和讨论了能够检测释义的现有抄袭检测系统的基准语料的表演。本研究的主要结果是识别单词重叠,结构表示和MT措施,作为特征子集,这导致支持向量机器的最佳性能结果,在语料库中的识别和抄袭检测中。深度学习技术实现的性能结果突出显示这些技术是该领域最有前景的研究方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号