首页> 外文会议>IEEE International Conference on Cloud Computing and Big Data Analysis >Chinese Weibo sentiment analysis based on character embedding with dual-channel convolutional neural network
【24h】

Chinese Weibo sentiment analysis based on character embedding with dual-channel convolutional neural network

机译:基于字符嵌入的双通道卷积神经网络中文微博情感分析

获取原文

摘要

As one of the most compelling NLP(Natural Language Processing) tasks, sentiment analysis becomes more and more popular. In this paper, we have proposed a new method of sentiment analysis by using pre-trained character embedding with a dual-channel convolutional neural network (char-DCCNN) to comprehend the sentiment of Sina Weibo's Chinese short comments. First of all, we divide Chinese corpus into single Chinese characters which are then trained as character vectors. Characters that appear less frequently are randomly initialized. Then, the vector matrix representing the text is input into a two-channel convolutional neural network. The vector of one channel remains static (as a kind of global feature) and another is fine-tuned (as a kind of local feature) according to the input data. Finally we record the train performance, validation performance and the final average validation performance respectively to reflect the sentiment classification results. The dataset used in this paper is NLPCC2012 micro-blog sentiment analysis datasets and the reviews of sports, film, social and other fields crawled from Sina Weibo. The 10-fold cross-validation will be employed. Three experiments are done to show the good performance of our method through three aspects - embedding, activation function, channel. In all, the char-DCCNN that we have put forward improves the sentiment classification results of Weibo Chinese short comments and possesses practical significance.
机译:作为最引人注目的自然语言处理(NLP)任务之一,情感分析变得越来越流行。在本文中,我们提出了一种使用双通道卷积神经网络(char-DCCNN)进行预训练的字符嵌入来理解新浪微博的中文简短评论的情感的新方法。首先,我们将中文语料库划分为单个汉字,然后将其训练为汉字向量。出现频率较低的字符将被随机初始化。然后,将代表文本的矢量矩阵输入到两通道卷积神经网络中。一个通道的向量保持静态(作为一种全局特征),而另一通道的向量根据输入数据进行微调(作为一种局部特征)。最后,我们分别记录火车性能,验证性能和最终平均验证性能,以反映情绪分类结果。本文使用的数据集为NLPCC2012微博情感分析数据集,以及从新浪微博抓取的体育,电影,社会等领域的评论。将使用10倍交叉验证。进行了三个实验,通过嵌入,激活函数,通道三个方面展示了我们方法的良好性能。总之,我们提出的char-DCCNN改善了微博汉语短评的情感分类结果,具有现实意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号