HYBRID MODEL FOR TWITTER DATA SENTIMENT ANALYSIS BASED ON ENSEMBLE OF DICTIONARY BASED CLASSIFIER AND STACKED MACHINE LEARNING CLASSIFIERS-SVM, KNN AND C5.0

SANGEETA RANI; NASIB SINGH GILL

摘要

Social Networking sites like Twitter and Facebook has offered the possibility to users to express their opinion on various topics and events. Opinion mining is a technique to find the sentiment of people about these topics, which can be useful in decision support. Various government policies can also be monitored by doing the sentiment analysis of related tweets. The objective of this research is to enhance the accuracy of twitter sentiment classification. The paper proposes a framework for a hybrid approach with an ensemble of stacked machine learning algorithms and dictionary based classifier. Sentiment Score extracted from dictionary based classifier is added as additional feature in the feature set. Three machine learning algorithms SVM, KNN and C5.0 are stacked to build an ensemble by using two Meta learners RF and GLM. Real time manually labeled tweets based on “Clean India Mission” an Indian government policy is used for implementation of the model. Proposed model is compared with different machine learning and ensemble classifiers. Proposed hybrid model recorded higher accuracy of 0.9066377 for 5 fold cross validation and 0.9124793 for 10 fold cross validation as compared to 0.8667328 in case of stacked ensemble of SVMRadial, KNN and C5.0 by using RF as Meta classifier. RF Meta classifier performed better as compared to GLM in all stacked based ensemble. Proposed model also recorded higher accuracy as compared to machine learning classifiers-SVM, Na?ve Bayes, Decision Tree, Random forest and Maximum Entropy. The contribution of the research is to enhance the accuracy of stacked based ensemble classifiers for twitter sentiment classification by using additional sentiment score provided by dictionary based classifier.

机译：像Twitter和Facebook这样的社交网站已经提供了用户对各种主题和事件的意见。意见采矿是一种寻找人们关于这些主题的情绪的技术，这可能在决策支持方面有用。还可以通过对相关推文的情感分析进行监测进行各种政策政策。本研究的目的是提高Twitter情绪分类的准确性。本文提出了一种具有混合方法的框架，其具有堆叠机器学习算法的集合和基于词典的分类器。从字典基于词典的分类提取的情感分数被添加为特征集中的附加功能。三种机器学习算法SVM，KNN和C5.0被堆叠以通过使用两个元学习者RF和GLM来构建集合。实时标记的推文基于“清洁印度使命”的印度政府政策用于实施该模型。提出模型与不同的机器学习和集合分类器进行比较。提出的混合模型记录了0.9066377的更高精度，5倍交叉验证，0.9124793，与0.8667328相比，在SVMradial，KNN和C5.0的堆叠集合的情况下，通过使用RF作为元分类器的情况。与所有基于堆叠的集合中的GLM相比，RF元分类器更好地执行。与机器学习分类器-SVM，Na ve Bayes，决策树，随机森林和最大熵相比，所提出的模型也记录了更高的准确性。研究的贡献是通过使用基于字典基于字典的分类器提供的额外情绪分数来提高基于堆叠的集合分类器的准确性。

HYBRID MODEL FOR TWITTER DATA SENTIMENT ANALYSIS BASED ON ENSEMBLE OF DICTIONARY BASED CLASSIFIER AND STACKED MACHINE LEARNING CLASSIFIERS-SVM, KNN AND C5.0

摘要

著录项

相关主题

期刊订阅