【24h】

Sundanese Twitter Dataset for Emotion Classification

机译:Sundanese Twitter DataSet用于情感分类

获取原文

摘要

Sundanese is the second-largest tribe in Indonesia which possesses many dialects. This condition has gained attention for many researchers to analyze emotion especially on social media. However, with barely available Sundanese dataset, this condition makes understanding sundanese emotion is a challenging task. In this research, we proposed a dataset for emotion classification of Sundanese text. The preprocessing includes case folding, stopwords removal, stemming, tokenizing, and text representation. Prior to classification, for the feature generation, we utilize term frequency-inverse document frequency (TFIDF). We evaluated our dataset using k-Fold Cross Validation. Our experiments with the proposed method exhibit an effective result for machine learning classification. Furthermore, as far as we know, this is the first Sundanese emotion dataset available for public.
机译:孙达尼斯是印度尼西亚第二大部落,拥有许多方言。这种情况对许多研究人员来说,尤其是在社交媒体上分析情感。然而,随着Sundanese DataSet勉强可用,这种情况使Sundanese情绪成为一个具有挑战性的任务。在这项研究中,我们提出了一个用于阳光文本的情感分类的数据集。预处理包括案例折叠,删除,止扰,令牌,令牌和文本表示。在分类之前,对于特征生成,我们利用术语频率反转文档频率(TFIDF)。我们使用k折叠交叉验证评估了我们的数据集。我们用所提出的方法的实验表现出机器学习分类的有效结果。此外,据我们所知,这是第一个为公众提供的阳光情绪数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号