Comparison on the Rule based Method and Statistical based Method on Emotion Classification for Indonesian Twitter Text

机译：基于规则的方法与统计学方法对印度尼西亚推特文本的情感分类比较

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this study, we conducted experiments on emotion classification of Indonesian Twitter text. To conduct such experiments, we built a corpus of labeled Twitter data with size of 7622 Twitter text taken from 69 Twitter accounts, manually labeled by 5 native speakers. We used 6 basic emotion labels (angry, disgust, fear, joy, sad, surprise) and add one label of neutral emotion class. Here, we compared a rule based method with a statistical based method. In the rule based method, we employed the existing Synesketch algorithm with two types of emotion word list: a manually written and a translated WordNet-Affect list. In the statistical based method, we employed SVM (Support Vector Machine) algorithm with unigram feature and feature selection algorithms of Information Gain and Minimum Frequency. Other than a pure statistical based method, we also employed the manually built emotion word list in the SVM based classification. In the text pre-processing, we compared several methods such as the normalization, emotion conversion, stop words removal, number removal, and a one-character token removal. The experimental results showed that the statistical based method result of 71.740% accuracy score is higher than the rule based method of 63.172% accuracy score. To enhance the accuracy, we employed SMOTE in order to handle the imbalanced data and achieved best result with the f-measure of 83.203%. In another experiment, we combined the pure statistical method with the rule based method by employing the manually word list into the classification features. The f-measure for this experiment has only reached 81.592%.

机译：在这项研究中，我们对印度尼西亚推特文本的情感分类进行了实验。要进行此类实验，我们建立了一个标记的Twitter数据的语料库，大小为7622 Twitter文本，从69个Twitter帐户中拍摄，由5名母语人员手动标记。我们使用了6个基本情感标签（愤怒，厌恶，恐惧，喜悦，悲伤，惊喜），并添加一个中性情感课的一个标签。在这里，我们将基于统计的方法进行了比较了基于规则的方法。在基于规则的方法中，我们使用具有两种类型的情感字列表的现有Synesketch算法：手动写入和翻译的Wordnet-Checil流列表。在基于统计的方法中，我们采用了具有Unigram特征的SVM（支持向量机）算法，以及信息增益和最小频率的特征选择算法。除了纯粹的统计方法之外，我们还将手动构建的情绪单词列表中的基于SVM的分类中使用。在文本预处理中，我们比较了几种方法，如归一化，情绪转换，停止单词删除，数字删除和一个字符的令牌拆卸。实验结果表明，基于统计的方法结果为71.740％的准确度分数高于基于规则的方法，精度得分为63.172％。为了提高准确性，我们雇用了SMOTE以处理不平衡的数据，并通过83.203％的F测量来实现最佳结果。在另一个实验中，我们通过使用手动字列表进入分类功能来将纯统计方法与规则的方法组合起来。该实验的F措施仅达到81.592％。

著录项

来源
《International Conference on Information Technology Systems and Innovation》|2015年|421p|共6页
会议地点
作者
Aldy Rialdy Atmadja; Ayu Purwarianti;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 G202-53;
关键词
Emotion Classification; Indonesian Twitter text; Rule based method; Statistical based method; Feature selection; Support Vector Machine; SMOTE;

机译：情感分类;印度尼西亚推特文本;基于规则的方法;基于统计的方法;特征选择;支持向量机;SMOTE;

相似文献

外文文献
中文文献
专利

1. A Comparison of Rule-Based and Machine Learning Methods for Classification of Spikes in EEG [J] . Wolfgang Ganglberger, Gerhard Gritsch, Manfred M. Hartmann, Journal of Communications . 2017,第10期

机译：基于规则和机器学习的脑电信号峰值分类方法的比较
2. A New Kernel-Based Classification Algorithm for Systems Monitoring: Comparison with Statistical Process Control Methods [J] . Foued Theljani, Kaouther Laabidi, Salah Zidi, Arabian Journal for Science and Engineering . 2015,第2期

机译：基于内核的系统监控新分类算法：与统计过程控制方法的比较
3. Research on feature classification method of network text data based on association rules [J] . International Journal of Computers & Applications . 2020,第1a2期

机译：基于关联规则的网络文本数据特征分类方法研究
4. Comparison on the Rule based Method and Statistical based Method on Emotion Classification for Indonesian Twitter Text [C] . Aldy Rialdy Atmadja, Ayu Purwarianti International Conference on Information Technology Systems and Innovation . 2015

机译：基于规则的方法与统计学方法对印度尼西亚推特文本的情感分类比较
5. A Comparison of Graphics-Based versus Text-Based Online Probe Methods for Predicting Performance of Air Traffic Controllers [D] . Battiste, Henri. 2018

机译：基于图形的基于文本的在线探测方法与预测空中交通管制员性能的比较
6. DEFINDER: Rule-based Methods for the Extraction of Medical Terminology and their Associated Definitions from On-line Text [O] . Judith L. Klavans, Smaranda Muresan 2000

机译：定义：从在线文本中提取医学术语及其相关定义的基于规则的方法
7. Data Classification based on Decision Tree, Rule Generation, Bayes and Statistical Methods: An Empirical Comparison [O] . Sanjib Saha, Debashis Nandi 2015

机译：基于决策树，规则生成，贝叶斯和统计方法的数据分类：实证比较
8. Comparison Between Two Statistically Based Methods, and Two Physically Based Models Developed to Compute Daily Mean Streamflow at Ungaged Locations in the Cedar River Basin, Iowa. [R] . Linhart, S. M., Nania, J. F., Christiansen, D. E., 2013

机译：两种基于统计的方法和两种基于物理的模型的比较，这两种模型是为了计算爱荷华州锡达河流域未开垦地点的每日平均流量而开发的。

Comparison on the Rule based Method and Statistical based Method on Emotion Classification for Indonesian Twitter Text

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅