首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >Paraphrase Detection Based on Identical Phrase and Similar Word Matching
【24h】

Paraphrase Detection Based on Identical Phrase and Similar Word Matching

机译:基于相同短语和类似字匹配的解释检测

获取原文

摘要

Paraphrase detection has numerous important applications in natural language processing (such as clustering, summarizing, and detecting plagiarism). One approach to detecting paraphrases is to use predicate argument tuples. Although this approach achieves high paraphrase recall, its accuracy is generally low. Other approaches focus on matching similar words, but word meaning is often contextual (e.g., 'get along with,' 'look forward to'). An effective approach to detecting plagiarism would take into account the fact that plagiarists frequently cut and paste whole phrases and/or replace several words with similar words. This generally results in the paraphrased text containing identical phrases and similar words. Moreover, plagiarists usually insert and/or remove various minor words (prepositions, conjunctions, etc.) to both improve the naturalness and disguise the paraphrasing. We have developed a similarity matching (SimMat) metric for detecting paraphrases that is based on matching identical phrases and similar words and quantifying the minor words. The metric achieved the highest paraphrase detection accuracy (77.6%) when it was combined with eight standard machine translation metrics. This accuracy is better than the 77.4% rate achieved with the state-of-the-art approach for paraphrase detection.
机译:解释检测在自然语言处理中具有许多重要的应用程序(例如聚类,总结和检测抄袭)。检测释义的一种方法是使用谓词参数元组。虽然这种方法实现了高释义召回,但其精度通常很低。其他方法侧重于匹配类似的单词,但字含义通常是语境(例如,'相处,'期待')。检测抄袭的有效方法将考虑到抄袭者经常削减和粘贴整个短语的事实和/或用类似单词替换几个单词。这通常会导致包含相同短语和类似单词的解除文本。此外,抄袭者通常插入和/或去除各种次要单词(介词,连词等),以改善自然度并伪装解释。我们开发了一种相似性匹配(SIMMAT)度量标准,用于检测基于匹配的相同短语和类似单词并量化次要单词的释义。该度量达到了最高的释义检测精度(77.6%)与八个标准机器翻译指标相结合。这种准确性优于使用最先进的方法来解释方法检测的77.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号