首页> 外文会议>International Conference on Information Management and Technology >Twitter Dataset for Hate Speech and Cyberbullying Detection in Indonesian Language
【24h】

Twitter Dataset for Hate Speech and Cyberbullying Detection in Indonesian Language

机译:Twitter数据集,用于印尼语中的仇恨言论和网络欺凌检测

获取原文

摘要

During the 2019 election period in Indonesia, many hate speech and cyberbullying cases have occurred in social media platforms including Twitter. The government tries to filter every negative content to be spread out during this period. However, to detect hate speech is not an easy task. This paper presents the process of developing a dataset that can be used to build a hate speech detection model. More than 1 million tweets have been successfully collected from using Twitter API. The basic preprocessing and preliminary study using machine learning was implemented. Latent Dirichlet Allocation (LDA) algorithm was used to extract the topic for each tweet to see whether these topics can be associated with debate themes. Pretrained sentiment analysis was also applied to the dataset to generate a polarity score for each tweet. From 83,752 tweets included in the analysis step, the number of positive and negative tweets are almost the same.
机译:在印度尼西亚的2019年大选期间,包括Twitter在内的社交媒体平台发生了许多仇恨言论和网络欺凌案件。政府试图过滤此期间要传播的所有负面内容。但是,检测仇恨言论并非易事。本文介绍了开发可用于构建仇恨语音检测模型的数据集的过程。通过使用Twitter API已成功收集了超过一百万条推文。实施了使用机器学习的基本预处理和初步研究。潜在狄利克雷分配(LDA)算法用于提取每个推文的主题,以查看这些主题是否可以与辩论主题相关联。预训练的情绪分析也应用于数据集,以生成每个推文的极性得分。在分析步骤中包含的83,752条推文中,正面和负面推文的数量几乎相同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号