首页> 外文期刊>Language Resources and Evaluation >A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics
【24h】

A hybrid approach for paraphrase identification based on knowledge-enriched semantic heuristics

机译:一种基于知识富集语义启发式的解释识别的混合方法

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we propose a hybrid approach for sentence paraphrase identification. The proposal addresses the problem of evaluating sentence-to-sentence semantic similarity when the sentences contain a set of named-entities. The essence of the proposal is to distinguish the computation of the semantic similarity of named-entity tokens from the rest of the sentence text. More specifically, this is based on the integration of word semantic similarity derived from WordNet taxo-nomic relations, and named-entity semantic relatedness inferred from Wikipedia entity co-occurrences and underpinned by Normalized Google Distance. In addition, the WordNet similarity measure is enriched with word part-of-speech (PoS) conversion aided with a Categorial Variation database (CatVar), which enhances the lexico-semantics of words. We validated our hybrid approach using two different datasets; Microsoft Research Paraphrase Corpus (MSRPC) and TREC-9 Question Variants. In our empirical evaluation, we showed that our system outperforms baselines and most of the related state-of-the-art systems for paraphrase detection. We also conducted a misidentification analysis to disclose the primary sources of our system errors.
机译:在本文中,我们提出了一种句子解释鉴定的混合方法。该提案解决了当句子包含一组命名实体时评估句子语义相似性的问题。该提议的本质是区分从句子文本的其余部分计算命名实体令牌的语义相似性。更具体地说,这是基于从Wordnet税务范围内的词语语义相似性的集成,并从维基百科实体共同发生并由归一化的Google距离下授权的命名实体语义相关性。此外,Wordnet相似度测量与语音部分(POS)转换有助于与分类变型数据库(Catvar)富有富集,这增强了词汇的词汇语义。我们使用两个不同的数据集验证了我们的混合方法; Microsoft研究释义语料库(MSRPC)和TREC-9问题变体。在我们的实证评估中,我们表明我们的系统优于基线和大多数相关最先进的系统来解释。我们还进行了错误识别的分析,以披露我们的系统错误的主要来源。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号