首页> 外文期刊>International journal of web information systems >Detecting hate speech against politicians in Arabic community on social media
【24h】

Detecting hate speech against politicians in Arabic community on social media

机译:在社交媒体上检测仇恨言论对阿拉伯社区政治家的仇恨

获取原文
获取原文并翻译 | 示例
           

摘要

Purpose - This paper aims to propose an approach for hate speech detection against politicians in Arabic community on social media (e.g. Youtube). In the literature, similar works have been presented for other languages such as English. However, to the best of the authors' knowledge, not much work has been conducted in the Arabic language. Design/methodology/approach - This approach uses both classical algorithms of classification and deep learning algorithms. For the classical algorithms, the authors use Gaussian NB (GNB), Logistic Regression (LR), Random Forest (RF), SGD Classifier (SGD) and Linear SVC (LSVC). For the deep learning classification, four different algorithms (convolutional neural network (CNN), multilayer perceptron (MLP), long- or short-term memory (LSTM) and bi-directional long- or short-term memory (Bi-LSTM) are applied. For extracting features, the authors use both Word2vec and FastText with their two implementations, namely, Skip Gram (SG) and Continuous Bag of Word (CBOW). Findings - Simulation results demonstrate the best performance of LSVC, BiLSTM and MLP achieving an accuracy up to 91 %, when it is associated to SG model. The results are also shown that the classification that has been done on balanced corpus are more accurate than those done on unbalanced corpus. Originality/value - The principal originality of this paper is to construct a new hate speech corpus (Arabic_fr_en) which was annotated by three different annotators. This corpus contains the three languages used by Arabic people being Arabic, French and English. For Arabic, the corpus contains both script Arabic and Arabizi (i.e. Arabic words written with Latin letters). Another originality is to rely on both shallow and deep leaning classification by using different model for extraction features such as Word2vec and FastText with their two implementation SG and CBOW.
机译:目的 - 本文旨在提出一种对社交媒体(例如YouTube)的阿拉伯语社区政客讨论仇恨讲话检测的方法。在文献中,已为英语等其他语言提出了类似的作品。然而,据作者的知识,并不多的工作已经用阿拉伯语进行。设计/方法/方法 - 这种方法使用分类和深度学习算法的经典算法。对于古典算法,作者使用高斯NB(GNB),Logistic回归(LR),随机林(RF),SGD分类器(SGD)和线性SVC(LSVC)。对于深度学习分类,四种不同的算法(卷积神经网络(CNN),多层erceptron(MLP),长期或短期存储器(LSTM)以及双向长或短期记忆(BI-LSTM)是应用。用于提取功能,作者使用Word2VEC和FastText两种实现,即跳过克(SG)和连续的单词(CBOW)。调查结果 - 模拟结果表明LSVC,Bilstm和MLP的最佳性能展示了当它与SG模型相关联时,准确度高达91%。结果也表明,在平衡的语料库上完成的分类比在不平衡语料库上完成的结果更准确。原创性/值 - 本文的主要原创性是构建一个由三个不同的注释器注释的新仇恨语音语音(阿拉伯语_fr_en)。该语料库包含阿拉伯语,法语和英语的阿拉伯语人员使用的三种语言。对于阿拉伯语,语料库包含两个脚本阿拉伯语和阿拉伯人(即用拉丁字母写的阿拉伯语词)。另一个原创性是通过使用不同的模型来依赖于浅层和深度倾斜的分类,用于提取特征,例如Word2VEC和FastText,其两个实现SG和CBOY。

著录项

  • 来源
  • 作者单位

    Laboratoire des Methodes de Conception des Systemes Ecole Nationale Superieure d'Informatique Algiers Algeria School of Engineering and Applied Science (EAS) Aston University Birmingham UK and Folding Space Birmingham UK;

    School of Mathematics and Computer Science University of Wolverhampton Wolverhampton UK;

    Laboratoire des Methodes de Conception des Systemes Ecole Nationale Superieure d'Informatique Algiers Algeria;

    Laboratoire des Methodes de Conception des Systemes Ecole Nationale Superieure d'Informatique Algiers Algeria;

    Laboratoire des Methodes de Conception des Systemes Ecole Nationale Superieure d'Informatique Algiers Algeria;

    Laboratoire des Methodes de Conception des Systemes Ecole Nationale Superieure d'Informatique Algiers Algeria;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Arabic hate speech;

    机译:阿拉伯语仇恨的言论;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号