首页> 外文期刊>Computer speech and language >Combining sentence similarities measures to identify paraphrases
【24h】

Combining sentence similarities measures to identify paraphrases

机译:结合句子相似性度量来识别释义

获取原文
获取原文并翻译 | 示例

摘要

Paraphrase identification consists in the process of verifying if two sentences are semantically equivalent or not. It is applied in many natural language tasks, such as text summarization, information retrieval, text categorization, and machine translation. In general, methods for assessing paraphrase identification perform three steps. First, they represent sentences as vectors using bag of words or syntactic information of the words present the sentence. Next, this representation is used to measure different similarities between two sentences. In the third step, these similarities are given as input to a machine learning algorithm that classifies these two sentences as paraphrase or not. However, two important problems in the area of paraphrase identification are not handled: (i) the meaning problem: two sentences sharing the same meaning, composed of different words; and (ⅱ) the word order problem: the order of the words in the sentences may change the meaning of the text. This paper proposes a paraphrase identification system that represents each pair of sentence as a combination of different similarity measures. These measures extract lexical, syntactic and semantic components of the sentences encompassed in a graph. The proposed method was benchmarked using the Microsoft Paraphrase Corpus, which is the publicly available standard dataset for the task. Different machine learning algorithms were applied to classify a sentence pair as paraphrase or not. The results show that the proposed method outperforms state-of-the-art systems.
机译:复述识别包括验证两个句子在语义上是否相等的过程。它可用于许多自然语言任务,例如文本摘要,信息检索,文本分类和机器翻译。通常,用于评估复述识别的方法执行三个步骤。首先,它们使用单​​词袋或表示该句子的单词的句法信息将句子表示为矢量。接下来,此表示法用于测量两个句子之间的不同相似度。在第三步中,将这些相似性作为机器学习算法的输入,该算法将这两个句子归类为释义。但是,在释义识别领域中没有解决两个重要的问题:(i)意义问题:两个具有相同含义,包含不同单词的句子; (ⅱ)单词顺序问题:句子中单词的顺序可能会改变文本的含义。本文提出了一种释义识别系统,该系统将每对句子表示为不同相似性度量的组合。这些度量提取图中包含的句子的词汇,句法和语义成分。所提出的方法使用Microsoft Paraphrase Corpus(该任务的公开标准数据集)进行了基准测试。应用了不同的机器学习算法将句子对分类为释义。结果表明,所提出的方法优于最先进的系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号