Detecting Semantic Similarity Of Documents Using Natural Language Processing

Saurabh Agarwala; Aniketh Anagawadi; Ram Mohana Reddy Guddeti

首页> 外文期刊>Procedia Computer Science >Detecting Semantic Similarity Of Documents Using Natural Language Processing

【24h】

Detecting Semantic Similarity Of Documents Using Natural Language Processing

机译：使用自然语言处理检测文档的语义相似性

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The similarity of documents in natural languages can be judged based on how similar the embeddings corresponding to their textual content are. Embeddings capture the lexical and semantic information of texts, and they can be obtained through bag-of-words approaches using the embeddings of constituent words or through pre-trained encoders. This paper examines various existing approaches to obtain embeddings from texts, which is then used to detect similarity between them. A novel model which builds upon the Universal Sentence Encoder is also developed to do the same. The explored models are tested on the SICK-dataset, and the correlation between the ground truth values given in the dataset and the predicted similarity is computed using the Pearson, Spearman and Kendall’s Tau correlation metrics. Experimental results demonstrate that the novel model outperforms the existing approaches. Finally, an application is developed using the novel model to detect semantic similarity between a set of documents.

机译：可以基于与其文本内容对应的嵌入式的相似性如何判断自然语言中的文档的相似性。嵌入式捕获文本的词汇和语义信息，并且可以通过使用组成词的嵌入或通过预先训练的编码器来通过单词方法获得。本文研究了从文本中获取嵌入的各种现有方法，然后使用它们来检测它们之间的相似性。还开发了一种在通用句子编码器上建立的新型模型来执行相同的方式。探索模型在Sick DataSet上进行了测试，并且使用Pearson，Spearman和Kendall的Tau相关指标计算数据集和预测的相似性之间的地面真理值之间的相关性。实验结果表明，新颖的模型优于现有的方法。最后，使用小型模型开发了应用程序来检测一组文档之间的语义相似性。

著录项

来源
《Procedia Computer Science》 |2021年第a期|共8页
作者
Saurabh Agarwala; Aniketh Anagawadi; Ram Mohana Reddy Guddeti;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
EmbeddingsNatural Language ProcessingSemantic SimilarityDeep LearningComputational Linguistic;

机译：EmbeddingsNatural语言处理蛋白大西洋类似Dep学习血统语言;

相似文献

外文文献
中文文献
专利

1. Semantic similarity of short texts in languages with a deficient natural language processing support [J] . Bojan Furlan, Vuk Batanovic, Bosko Nikolic Decision support systems . 2013,第3期

机译：缺乏自然语言处理支持的语言中的短文本的语义相似性
2. A Natural Language Processing Approach to Measuring Treatment Adherence and Consistency Using Semantic Similarity [J] . Kylie L. Anglin, Vivian C. Wong, Arielle Boguslav AERA Open . 2021,第a期

机译：使用语义相似性测量治疗粘附和一致性的自然语言处理方法
3. Natural Language Processing for Detecting Forward Reference in a Document [J] . Daniel Siahaan, Izzatul Umami IPTEK The Journal for Technology and Science . 2012,第4期

机译：用于检测文档中前向引用的自然语言处理
4. On the Usage of Semantic Text-Similarity Metrics for Natural Language Processing in Russian [C] . Mikhail Koroteev International Conference on Management of large-scale system development . 2020

机译：语义相似度度量标准在俄语自然语言处理中的应用
5. Semantic Similarity Detection in Natural Language Documents. [D] . Zhao, Lianyu. 2012

机译：自然语言文档中的语义相似性检测。
6. Understanding the spatial dimension of natural language by measuring the spatial semantic similarity of words through a scalable geospatial context window [O] . Bozhi Wang, Teng Fei, Yuhao Kang, 2020

机译：通过测量通过可扩展的地理空间上下文窗口测量单词的空间语义相似性来了解自然语言的空间维度
7. Exploring the feasability and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications. [O] . Magerman Tom, Van Looy Bart, Song Xiaoyan 100

机译：探索基于潜在语义分析的文本挖掘技术的可行性和准确性，以检测专利文献与科学出版物之间的相似性。

Detecting Semantic Similarity Of Documents Using Natural Language Processing

摘要

著录项

相似文献

相关主题

期刊订阅