Twitter Dataset for Hate Speech and Cyberbullying Detection in Indonesian Language

机译：Twitter数据集，用于印尼语中的仇恨言论和网络欺凌检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

During the 2019 election period in Indonesia, many hate speech and cyberbullying cases have occurred in social media platforms including Twitter. The government tries to filter every negative content to be spread out during this period. However, to detect hate speech is not an easy task. This paper presents the process of developing a dataset that can be used to build a hate speech detection model. More than 1 million tweets have been successfully collected from using Twitter API. The basic preprocessing and preliminary study using machine learning was implemented. Latent Dirichlet Allocation (LDA) algorithm was used to extract the topic for each tweet to see whether these topics can be associated with debate themes. Pretrained sentiment analysis was also applied to the dataset to generate a polarity score for each tweet. From 83,752 tweets included in the analysis step, the number of positive and negative tweets are almost the same.

机译：在印度尼西亚的2019年大选期间，包括Twitter在内的社交媒体平台发生了许多仇恨言论和网络欺凌案件。政府试图过滤此期间要传播的所有负面内容。但是，检测仇恨言论并非易事。本文介绍了开发可用于构建仇恨语音检测模型的数据集的过程。通过使用Twitter API已成功收集了超过一百万条推文。实施了使用机器学习的基本预处理和初步研究。潜在狄利克雷分配（LDA）算法用于提取每个推文的主题，以查看这些主题是否可以与辩论主题相关联。预训练的情绪分析也应用于数据集，以生成每个推文的极性得分。在分析步骤中包含的83,752条推文中，正面和负面推文的数量几乎相同。

著录项

来源
《International Conference on Information Management and Technology》|2019年|379-382|共4页
会议地点
作者
Trisna Febriana; Arif Budiarto;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Twitter; Support vector machines; Voting; Sentiment analysis; Media;

机译：Twitter;支持向量机;投票;情感分析;媒体;

相似文献

外文文献
中文文献
专利

1. Hate Speech Detection in Indonesian Twitter using Contextual Embedding Approach [J] . Guntur Budi Herwanto, Annisa Maulida Ningtyas, I Gede Mujiyatna, Indonesian Journal of Computing and Cybernetics Systems . 2021,第2期

机译：使用上下文嵌入方法在印度尼西亚推特中讨厌讲话检测
2. How well do hate speech, toxicity, abusive and offensive language classification models generalize across datasets? [J] . Paula Fortuna, Juan Soler-Company, Leo Wanner Information Processing & Management . 2021,第3期

机译：仇恨言语，毒性，滥用和令人反感的语言分类模型如何概括到数据集？
3. Hate speech detection: A solved problem? The challenging case of long tail on Twitter [J] . Zhang Ziqi, Luo Lei Semantic web . 2019,第5期

机译：讨厌讲话检测：解决问题？ Twitter上长尾的挑战性案例
4. Twitter Dataset for Hate Speech and Cyberbullying Detection in Indonesian Language [C] . Trisna Febriana, Arif Budiarto International Conference on Information Management and Technology . 2019

机译：Twitter DataSet用于仇恨语音和印度尼西亚语言的网络欺凌检测
5. Hate Speech Detection in Twitter: A Selectively Trained Ensemble Method [D] . ?Houston, Jackson 2020

机译：Twitter中的讨厌语音检测：选择性训练的合奏方法
6. Hate Speech Emotions and Gender Identities: A Study of Social Narratives on Twitter with Trainee Teachers [O] . Delfín Ortega-Sánchez, Joan Pagès Blanch, Jaime Ibáñez Quintana, 2021

机译：讨厌言语情感和性别身份：与实习教师的推特上的社会叙述研究
7. L-HSAB: A Levantine Twitter Dataset for Hate Speech and Abusive Language [O] . Hala Mulki, Hatem Haddad, Chedi Bechikh Ali, 2019

机译：L-HSAB：唯一Twitter DataSet，用于仇恨言语和滥用语言

Twitter Dataset for Hate Speech and Cyberbullying Detection in Indonesian Language

摘要

著录项

相似文献

相关主题

期刊订阅