首页> 外文期刊>Concurrency and computation: practice and experience >A survey on the techniques, applications, and performance of short text semantic similarity
【24h】

A survey on the techniques, applications, and performance of short text semantic similarity

机译:短文本语义相似性的技术,应用和性能调查

获取原文
获取原文并翻译 | 示例

摘要

Short text similarity plays an important role in natural language processing (NLP). It has been applied in many fields. Due to the lack of sufficient context in the short text, it is difficult to measure the similarity. The use of semantics similarity to calculate textual similarity has attracted the attention of academia and industry and achieved better results. In this survey, we have conducted a comprehensive and systematic analysis of semantic similarity. We first propose three categories of semantic similarity: corpus-based, knowledge-based, and deep learning (DL)-based. We analyze the pros and cons of representative and novel algorithms in each category. Our analysis also includes the applications of these similarity measurement methods in other areas of NLP. We then evaluate state-of-the-art DL methods on four common datasets, which proved that DL-based can better solve the challenges of the short text similarity, such as sparsity and complexity. Especially, bidirectional encoder representations from transformer model can fully employ scarce information of short texts and semantic information and obtain higher accuracy and F1 value. We finally put forward some future directions.
机译:短文本相似性在自然语言处理中发挥着重要作用(NLP)。它已应用于许多领域。由于短文本中缺乏足够的背景,难以衡量相似性。使用语义相似性来计算文本相似性引起了学术界和行业的关注,并取得了更好的结果。在本调查中,我们对语义相似性进行了全面和系统的分析。我们首先提出了三类语义相似性:基于语料库,基于知识和深度学习(DL)。我们分析了每个类别中代表和小说算法的优缺点。我们的分析还包括这些相似性测量方法在NLP的其他领域的应用。然后,我们在四个常见数据集上评估最先进的DL方法,这证明了基于DL的可以更好地解决短文本相似性的挑战,例如稀疏性和复杂性。特别地,来自变压器模型的双向编码器表示可以完全采用短文本和语义信息的稀缺信息,并获得更高的精度和F1值。我们终于提出了一些未来的指示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号