首页> 外文会议>International Conference on Sustainable Information Engineering and Technology >Sentiment Analysis on Movie Reviews Using Ensemble Features and Pearson Correlation Based Feature Selection
【24h】

Sentiment Analysis on Movie Reviews Using Ensemble Features and Pearson Correlation Based Feature Selection

机译:基于集合特征和基于Pearson相关的特征选择的电影评论情感分析

获取原文

摘要

Microblogging has become the media information that is very popular among internet users. Therefore, the microblogging became a source of rich data for opinions and reviews especially on movie reviews. We proposed, sentiment analysis on movie review using ensemble features and Bag of Words and selection Features Pearson's Correlation to reduce the dimension of the feature and get the optimal feature combinations. Use the feature selection is done to improve the performance of the classification, reducing the dimension of the feature and get the optimal feature combinations. The process of classification using several models of Naïve Bayes i.e. Bernoulli Naïve Bayes for binary data, Gaussian Naïve Bayes for continuous data and Multinomial Naïve Bayes for numeric data. The results of this study indicate that by using the non-standard word on tweet evaluation results obtained accuracy 82%, precision 86%, recall 79.62% and f-measure 82.69% using Feature Selection 20%. Then after using manual standardization of word the evaluation results on the accuracy increased by 8% and then the accuracy becomes 90%, precision 92%, recall 88.46% and f-measure 90.19% using 85% feature selection. Based on these results it can be concluded that by using the standardization of word can improve the performance of classification and feature selection Pearson's provide optimal feature combinations and reducing the total number of dimensions' feature.
机译:微博已成为在互联网用户中非常流行的媒体信息。因此,微博成为丰富的意见和评论数据来源,尤其是电影评论。我们提出了使用合奏特征和词袋法对电影评论进行情感分析的方法,然后选择特征皮尔逊相关系数以减少特征的维数并获得最佳的特征组合。使用特征选择可以提高分类性能,减小特征尺寸并获得最佳特征组合。使用朴素贝叶斯的几种模型进行分类的过程,即伯努利朴素贝叶斯用于二进制数据,高斯朴素贝叶斯用于连续数据以及多项朴素贝叶斯用于数值数据。这项研究的结果表明,通过在推特上使用非标准单词,评估结果使用功能选择20%获得了82%的准确性,86%的准确性,79.62%的召回率和82.69%的f-measure。然后,在使用单词的手动标准化之后,对准确性的评估结果提高了8%,然后使用85%的特征选择,准确性变为90%,精度为92%,召回率为88.46%和f-measure为90.19%。根据这些结果,可以得出结论,通过使用单词的标准化可以提高分类和特征选择的性能,Pearson提供了最佳的特征组合并减少了维数总数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号