首页> 外文期刊>Future generation computer systems >Measuring the short text similarity based on semantic and syntactic information
【24h】

Measuring the short text similarity based on semantic and syntactic information

机译:基于语义和句法信息测量短文本相似性

获取原文
获取原文并翻译 | 示例

摘要

Determining the similarity between short texts plays an important role in natural language processing applications such as search, query suggestion and automatic summary, which has attracted widespread attention. Unlike traditional long texts, short texts present the characteristics of short length, weak signal, and high ambiguity. Researchers have proposed many methods, from simple vector space models to more sophisticated distributed semantics. However, these methods only consider the literal meaning of words, ignoring the impact of word ambiguity and the semantic information contained in the structure of the short text. Additionally, words on their own are often insufficient for expressing semantics, as many terms are composed of multiple words. In this paper, we propose a method based on semantic and syntactic information for short text similarity calculations by using knowledge and corpora to express the meaning of the term to solve polysemy, and using a constituency parse tree to capture the syntactic structure of short texts. Additionally, the proposed method uses terms as semantic units. Experimental results on ground-truth datasets demonstrate that the proposed method outperforms baseline methods.
机译:确定短文本之间的相似性在自然语言处理应用中起重要作用,例如搜索,查询建议和自动摘要,它引起了广泛的关注。与传统的长文本不同,短文本呈现短长度,弱信号和高歧义的特征。研究人员提出了许多方法,从简单的矢量空间模型到更复杂的分布式语义。然而,这些方法仅考虑词语的字面意义,忽略了词语歧义的影响和文本结构中包含的语义信息。此外,他们自己的单词通常不足以表达语义,因为许多术语由多个单词组成。在本文中,我们提出了一种基于语义和句法信息的方法,通过使用知识和语料来表达术语求解多义的含义,并使用选区解析树来捕获短文本的句法结构。此外,所提出的方法使用术语作为语义单元。地面真实数据集的实验结果表明,所提出的方法优于基线方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号