首页> 外文期刊>Information Processing & Management >A deep network model for paraphrase detection in short text messages
【24h】

A deep network model for paraphrase detection in short text messages

机译:用于短消息中的短语释义检测的深度网络模型

获取原文
获取原文并翻译 | 示例

摘要

This paper is concerned with paraphrase detection, i.e., identifying sentences that are semantically identical. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Recognizing this importance, we study in particular how to address the challenges with detecting paraphrases in user generated short texts, such as Twitter, which often contain language irregularity and noise, and do not necessarily contain as much semantic information as longer clean texts. We propose a novel deep neural network-based approach that relies on coarse-grained sentence modelling using a convolutional neural network (CNN) and a recurrent neural network (RNN) model, combined with a specific fine-grained word level similarity matching model. More specifically, we develop a new architecture, called DeepParaphrase, which enables to create an informative semantic representation of each sentence by (1) using CNN to extract the local region information in form of important n-grams from the sentence, and (2) applying RNN to capture the long-term dependency information. In addition, we perform a comparative study on state-of-the-art approaches within paraphrase detection. An important insight from this study is that existing paraphrase approaches perform well when applied on clean texts, but they do not necessarily deliver good performance against noisy texts, and vice versa. In contrast, our evaluation has shown that the proposed DeepParaphrase-based approach achieves good results in both types of texts, thus making it more robust and generic than the existing approaches.
机译:本文涉及释义检测,即识别语义上相同的句子。检测以自然语言编写的相似句子的能力对于多种应用至关重要,例如文本挖掘,文本摘要,抄袭检测,作者身份验证和问题解答。认识到这一重要性,我们特别研究了如何解决在用户生成的短文本(例如Twitter)中检测复述的挑战,这些文本通常包含语言不规则和杂音,并且不一定包含与较长的纯文本相同的语义信息。我们提出了一种基于深度神经网络的新颖方法,该方法依赖于使用卷积神经网络(CNN)和递归神经网络(RNN)模型以及特定的细粒度词级相似度匹配模型的粗粒度语句建模。更具体地说,我们开发了一种称为DeepParaphrase的新架构,该架构可通过(1)使用CNN从句子中提取重要n-gram形式的局部区域信息来创建每个句子的信息语义表示,以及(2)应用RNN捕获长期依赖信息。此外,我们对复述检测中的最新方法进行了比较研究。这项研究的一个重要见解是,现有的释义方法在应用于纯净文本时效果很好,但是对于嘈杂的文本,它们不一定能提供良好的性能,反之亦然。相比之下,我们的评估表明,建议的基于DeepParaphrase的方法在两种类型的文本中均取得了良好的效果,因此比现有方法更健壮和通用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号