首页> 外文会议>Colloquium in Information Science and Technology >Some methods to address the problem of unbalanced sentiment classification in an arabic context
【24h】

Some methods to address the problem of unbalanced sentiment classification in an arabic context

机译:解决阿拉伯语中不平衡情绪分类问题的一些方法

获取原文

摘要

The rise of social media (such as online web forums and social networking sites) has attracted interests to mining and analyzing opinions available on the web. The online opinion has become the object of studies in many research areas; especially that called “Opinion Mining and Sentiment Analysis”. Several interesting and advanced works were performed on few languages (in particular English). However, there were very few studies on some languages such as Arabic. This paper presents the study we have carried out to address the problem of unbalanced data sets in supervised sentiment classification in an Arabic context. We propose three different methods to under-sample the majority class documents. Our goal is to compare the effectiveness of the proposed methods with the common random under-sampling. We also aim to evaluate the behavior of the classifier toward different under-sampling rates. We use two different common classifiers, namely Naïve Bayes and Support Vector Machines. The experiments are carried out on an Arabic data set that we have built from Aljazeera's web site and labeled manually. The results show that Naïve Bayes is sensitive to data set size, the more we reduce the data the more the results degrade. However, it is not sensitive to unbalanced data sets on the contrary of Support Vector Machines which is highly sensitive to unbalanced data sets. The results show also that we can rely on the proposed techniques and that they are typically competitive with random under-sampling.
机译:社交媒体的兴起(如在线网络论坛和社交网站)吸引了挖掘和分析网络上可用意见的兴趣。在线意见已成为许多研究领域的研究对象;特别是称为“意见采矿和情感分析”。几种有趣和高级的作品是对几种语言(特别是英语)进行的。但是,对阿拉伯语如某些语言的研究非常少。本文介绍了我们在阿拉伯语背景下解决了监督情绪分类中不平衡数据集问题的研究。我们提出了三种不同的方法来对多数课程文件进行了案例。我们的目标是比较所提出的方法的有效性与常见的随机抽样。我们还旨在评估分类器对不同欠抽样率的行为。我们使用两种不同的常见分类器,即天鹅湾和支持向量机。实验是在我们从Aljazeera的网站建造并手动标记的阿拉伯数据集上进行的。结果表明,Naïve贝叶斯对数据集大小敏感,我们越减少了数据的结果越辞。然而,对支持向量机的相反,对不平衡数据集非常敏感的相反,它对不平衡数据集不敏感。结果还表明,我们可以依赖于所提出的技术,并且它们通常具有随机抽样的竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号