A Chinese short text semantic similarity computation model based on stop words and TongyiciCilin

机译：基于止损词和桐义林的中国短文本语义相似性计算模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Short text similarity computing plays an important role in natural language processing, and it can be applied to many tasks. In recent years, there are lots of researches getting important results on natural language processing. Although there are some good results in English, there is no major breakthrough in Chinese. Different from the proposed methods, we reserve the Stop words in the training dataset of word vector for Chinese characteristics, and add the TongyiciCilin to the training data of the short text semantic similarity computation model. We compared the effect of Word2vec and Glove methods in our model. We use the Chinese short text semantic similarity dataset which is designed by Chinese grammar experts. The results show that the accuracy of the model is improved by 2%-3% by retaining Stop words in word vector training data and adding TongyiciCilin to training data. The accuracy of our model is better than Baidu short text similarity calculation platform on the same testing dataset.

机译：短文本相似性计算在自然语言处理中起重要作用，可以应用于许多任务。近年来，有很多研究在自然语言处理中获得了重要成果。虽然有一些良好的英语结果，但中文没有重大突破。与所提出的方法不同，我们在培训数据集中保留了中国特征的训练数据集中的停止词，并将汤内奇林添加到短文本语义相似性计算模型的训练数据。我们比较了Word2VEC和手套方法在模型中的效果。我们使用中文短文本语义相似性数据集，由中国语法专家设计。结果表明，通过在Word Vector训练数据中保留停止单词并将铜义林添加到训练数据来提高模型的准确性。我们模型的准确性优于同一测试数据集上的百度短文本相似性计算平台。

著录项

来源
《International Conference on Computer Science and Network Technology》|2017年|536p|共5页
会议地点
作者
Tang Shancheng; Bai Yunyue; Ma Fuyu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP3-53;
关键词
Semantics; Computational modeling; Training; Testing; Training data; Data models; Natural language processing;

机译：语义;计算建模;培训;测试;培训数据;数据模型;自然语言处理;
入库时间 2022-08-20 23:16:39

相似文献

外文文献
中文文献
专利

1. A Comparison of Approaches for Measuring the Semantic Similarity of Short Texts Based on Word Embeddings [J] . Karlo Babi?, Francesco Guerra, Sanda Martin?i?-Ip?i?, Journal of Information and Organizational Sciences . 2020,第2期

机译：基于Word Embeddings测量短文本语义相似性的方法的比较
2. Mining Keywords from Short Text Based on LDA-Based Hierarchical Semantic Graph Model [J] . International journal of information systems in the service sector . 2020,第2期

机译：基于LDA的分层语义图模型从短文本中挖掘关键词
3. Learning short-text semantic similarity with word embeddings and external knowledge sources [J] . Nguyen Hien T., Duong Phuc H., Cambria Erik Knowledge-Based Systems . 2019,第Octa15期

机译：通过单词嵌入和外部知识源学习短文本语义相似性
4. A Chinese short text semantic similarity computation model based on stop words and TongyiciCilin [C] . Tang Shancheng, Bai Yunyue, Ma Fuyu International Conference on Computer Science and Network Technology . 2017

机译：基于停用词和TongyiciCilin的中文短文本语义相似度计算模型
5. Short-Text Semantic Similarity: Algorithms and Applications. [D] . Sultan, Md Arafat. 2016

机译：短文本语义相似性：算法和应用。
6. Similarity of fMRI Activity Patterns in Left Perirhinal Cortex Reflects Semantic Similarity between Words [O] . Rose Bruffaerts, Patrick Dupont, Ronald Peeters, 2013

机译：左周围皮层功能磁共振成像活动模式的相似性反映了单词之间的语义相似性
7. Similarity Calculation Method of Chinese Short Text Based on Semantic Feature Space [O] . Liqiang Pan, Pu Zhang, Anping Xiong 2015

机译：基于语义特征空间的中文短文相似度计算方法

A Chinese short text semantic similarity computation model based on stop words and TongyiciCilin

摘要

著录项

相似文献

相关主题

期刊订阅