首页> 外文会议>International Conference on Cyber and IT Service Management >Feature selection based on Genetic algorithm, particle swarm optimization and principal component analysis for opinion mining cosmetic product review
【24h】

Feature selection based on Genetic algorithm, particle swarm optimization and principal component analysis for opinion mining cosmetic product review

机译:基于遗传算法的特征选择,粒子群优化和意见挖掘化妆品产品综述的主要成分分析

获取原文

摘要

Opinion mining is an automation technique of textual data from opinion sentence that produce sentiment information. It is also called sentiment analysis that involves the construction of a system for collecting and classifying opinions about a product review done by understanding, extracting and processing the text in an opinion sentence become positive, negative, and neutral. One of the techniques mostly used by data classification is Support Vector Machine (SVM). SVM is able to identify the separated hyper plane that maximizes the margin between two different classes. However, SVM has a weakness for parameter selection or suitable feature. In this research, the researchers made an improvement toward the previous research using combined method of feature selection in SVM through comparing three-feature selection; Genetic Algorithm, Particle Swarm Optimization, and Principal Component Analysis. It can be determined which one of the best feature selections that improve the classification accuracy of SVM. The dataset was cosmetic products review downloaded from www.amazon.com. Measurement is based on SVM accuracy by adding the feature selection method. While the evaluation used 10 Fold Cross Validation and the accuracy measurement used the Confusion Matrix and ROC Curve. The result of the measurement accuracy of SVM accuracy is obtained with average 82.00% and the average AUC 0.988. After the integration of SVM algorithm and feature selection, Genetic algorithm shows the best results with average accuracy 94.00% and the average AUC 0.984. Particle Swarm Optimization indicates the best results with average accuracy 97.00% and the average AUC 0.988. While Principal Component Analysis indicates the best results with average accuracy 83.00% and the average AUC 0.809. As conclusion, the research of SVM Algorithm showed the best accuracy improvement toward the feature selection of Particle Swarm Optimization integrated with the increased accuracy from 82.00% to 97.00%.
机译:意见挖掘是来自意见句的文本数据的自动化技术,从而产生情绪信息。它也被称为情感分析,涉及通过理解,提取和处理意见句中的文本来建造用于收集和分类关于产品审查的意见的系统,成为积极的,消极和中立。数据分类主要使用的技术之一是支持向量机(SVM)。 SVM能够识别分离的超平面,最大化两个不同类之间的边距。但是,SVM对参数选择或合适的特征具有弱点。在这项研究中,研究人员通过比较三个特征选择,在SVM中使用特征选择的组合方法对先前的研究进行了改善;遗传算法,粒子群优化和主成分分析。可以确定哪一个提高SVM分类精度的最佳特征选择。 DataSet是化妆品产品审查从www.amazon.com下载。通过添加特征选择方法,测量基于SVM精度。虽然评估使用了10倍交叉验证,并且精度测量使用混淆矩阵和ROC曲线。测量精度的SVM精度的结果,平均为82.00 %和平均AUC 0.988。在SVM算法和特征选择的集成之后,遗传算法显示了平均精度的最佳效果94.00 %和平均AUC 0.984。粒子群优化表明平均精度为97.00 %和平均AUC 0.988的最佳效果。虽然主成分分析表明平均精度为83.00 %和平均AUC 0.809的最佳效果。总之,SVM算法的研究表明,普通群优化特征选择的最佳准确性改进,其精度增加到82.00 %至97.00 %。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号