首页> 外文会议>International Conference on Computer Science and Network Technology >A Chinese short text semantic similarity computation model based on stop words and TongyiciCilin
【24h】

A Chinese short text semantic similarity computation model based on stop words and TongyiciCilin

机译:基于止损词和桐义林的中国短文本语义相似性计算模型

获取原文

摘要

Short text similarity computing plays an important role in natural language processing, and it can be applied to many tasks. In recent years, there are lots of researches getting important results on natural language processing. Although there are some good results in English, there is no major breakthrough in Chinese. Different from the proposed methods, we reserve the Stop words in the training dataset of word vector for Chinese characteristics, and add the TongyiciCilin to the training data of the short text semantic similarity computation model. We compared the effect of Word2vec and Glove methods in our model. We use the Chinese short text semantic similarity dataset which is designed by Chinese grammar experts. The results show that the accuracy of the model is improved by 2%-3% by retaining Stop words in word vector training data and adding TongyiciCilin to training data. The accuracy of our model is better than Baidu short text similarity calculation platform on the same testing dataset.
机译:短文本相似性计算在自然语言处理中起重要作用,可以应用于许多任务。近年来,有很多研究在自然语言处理中获得了重要成果。虽然有一些良好的英语结果,但中文没有重大突破。与所提出的方法不同,我们在培训数据集中保留了中国特征的训练数据集中的停止词,并将汤内奇林添加到短文本语义相似性计算模型的训练数据。我们比较了Word2VEC和手套方法在模型中的效果。我们使用中文短文本语义相似性数据集,由中国语法专家设计。结果表明,通过在Word Vector训练数据中保留停止单词并将铜义林添加到训练数据来提高模型的准确性。我们模型的准确性优于同一测试数据集上的百度短文本相似性计算平台。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号