首页> 外文期刊>Procedia Computer Science >Detecting Semantic Similarity Of Documents Using Natural Language Processing
【24h】

Detecting Semantic Similarity Of Documents Using Natural Language Processing

机译:使用自然语言处理检测文档的语义相似性

获取原文
           

摘要

The similarity of documents in natural languages can be judged based on how similar the embeddings corresponding to their textual content are. Embeddings capture the lexical and semantic information of texts, and they can be obtained through bag-of-words approaches using the embeddings of constituent words or through pre-trained encoders. This paper examines various existing approaches to obtain embeddings from texts, which is then used to detect similarity between them. A novel model which builds upon the Universal Sentence Encoder is also developed to do the same. The explored models are tested on the SICK-dataset, and the correlation between the ground truth values given in the dataset and the predicted similarity is computed using the Pearson, Spearman and Kendall’s Tau correlation metrics. Experimental results demonstrate that the novel model outperforms the existing approaches. Finally, an application is developed using the novel model to detect semantic similarity between a set of documents.
机译:可以基于与其文本内容对应的嵌入式的相似性如何判断自然语言中的文档的相似性。嵌入式捕获文本的词汇和语义信息,并且可以通过使用组成词的嵌入或通过预先训练的编码器来通过单词方法获得。本文研究了从文本中获取嵌入的各种现有方法,然后使用它们来检测它们之间的相似性。还开发了一种在通用句子编码器上建立的新型模型来执行相同的方式。探索模型在Sick DataSet上进行了测试,并且使用Pearson,Spearman和Kendall的Tau相关指标计算数据集和预测的相似性之间的地面真理值之间的相关性。实验结果表明,新颖的模型优于现有的方法。最后,使用小型模型开发了应用程序来检测一组文档之间的语义相似性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号