Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets

Oriola Oluwafemi; Kotze Eduan

首页> 外文期刊>Quality Control, Transactions >Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets

【24h】

Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets

机译：评估机器学习技术，用于检测南非推文中的冒犯和仇恨言论

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In recent times, South Africa has been witnessing insurgence of offensive and hate speech along racial and ethnic dispositions on Twitter. Popular among the South African languages used is English. Although, machine learning has been successfully used to detect offensive and hate speech in several English contexts, the distinctiveness of South African tweets and the similarities among offensive, hate and free speeches require domain-specific English corpus and techniques to detect the offensive and hate speech. Thus, we developed an English corpus from South African tweets and evaluated different machine learning techniques to detect offensive and hate speech. Character n-gram, word n-gram, negative sentiment, syntactic-based features and their hybrid were extracted and analyzed using hyper-parameter optimization, ensemble and multi-tier meta-learning models of support vector machine, logistic regression, random forest, gradient boosting algorithms. The results showed that optimized support vector machine with character n-gram performed best in detection of hate speech with true positive rate of 0.894, while optimized gradient boosting with word n-gram performed best in detection of hate speech with true positive rate of 0.867. However, their performances in detection of other threatening classes were poor. Multi-tier meta-learning models achieved the most consistent and balanced classification performance with true positive rates of 0.858 and 0.887 for hate speech and offensive speech, respectively as well as true positive rate of 0.646 for free speech and overall accuracy of 0.671. The error analysis showed that multi-tier meta-learning model could reduce the misclassification error rate of the optimized models by 34.26 & x0025;.

机译：最近，南非一直在目睹了在Twitter上的种族和种族倾向的冒犯性和仇恨言论的叛乱。流行的南非语言是英语。虽然，机器学习已成功地用于在几种英语语境中检测冒犯和仇恨的言论，南非推文的独特性和攻击性，仇恨和自由言论的相似性需要域特定的英语语料库和技术来检测令人反感和仇恨的语音。因此，我们开发了南非推文的英语语料库，并评估了不同的机器学习技巧，以检测冒犯和仇恨的言论。用Hyper参数优化，集合和多层元学习模型，Logistic回归，随机森林，提取和分析字符N-GRAM，Word N-Gram，负面情绪，句法的特征及其杂种。梯度升压算法。结果表明，具有字符N-GRAM的优化支持向量机在检测的仇恨语音中表现为具有0.894的真正阳性率的仇恨语音，而用词n-gram的优化梯度在检测到仇恨语音的真正阳性速率为0.867的检测。然而，他们在检测其他威胁阶级的表现差。多层元学习模型实现了最符合和平衡的分类性能，具有0.858和0.887的真正阳性率，分别为仇恨言论和令人反感的演讲，以及0.646的真正阳性率，用于自由言论，总体准确性为0.671。误差分析表明，多层元学习模型可以将优化模型的错误分类错误率降低34.26＆x0025;

著录项

来源
《Quality Control, Transactions》 |2020年第2020期|21496-21509|共14页
作者
Oriola Oluwafemi; Kotze Eduan;
展开▼
作者单位

Univ Free State Dept Comp Sci & Informat ZA-9301 Bloemfontein South Africa;

Univ Free State Dept Comp Sci & Informat ZA-9301 Bloemfontein South Africa;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Machine learning; South Africa; Twitter; hate speech; offensive speech;

机译：机器学习;南非;推特;讨厌讲话;令人反感的演讲;

相似文献

外文文献
中文文献
专利

1. Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions [J] . Femi Emmanuel Ayo, Olusegun Folorunso, Friday Thomas Ibharalu, Computer science review . 2020,第Nova期

机译：机器学习技术推特数据的仇恨语音分类：最先进的，未来的挑战和研究方向
2. Detecting Parkinson's disease with sustained phonation and speech signals using machine learning techniques [J] . Almeida Jefferson S., Reboucas Filho Pedro R., Carneiro Tiago, Pattern recognition letters . 2019,第JULa期

机译：使用机器学习技术以持续的发声和语音信号检测帕金森氏病
3. Detecting Parkinson's disease with sustained phonation and speech signals using machine learning techniques [J] . Almeida Jefferson S., Reboucas Filho Pedro R., Carneiro Tiago, Pattern recognition letters . 2019,第Jula期

机译：使用机器学习技术检测帕金森病的持续发声和语音信号
4. SINAI at SemEval-2019 Task 5: Ensemble learning to detect hate speech against inmigrants and women in English and Spanish tweets [C] . Flor Miriam Plaza-del-Arco, M. Dolores Molina-González, M. Teresa Martín-Valdivia, Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies;International workshop on semantic evaluation . 2019

机译：SINAI在SemEval-2019上的任务5：学会学习英语和西班牙语推文中针对移民和妇女的仇恨言论
5. Machine Learning Algorithms and Natural Language Processing Techniques for Crime Prediction with Geo-tagged Tweets [D] . Alsalman, Alanoud. 2018

机译：用地理标记推文的犯罪预测机器学习算法和自然语言处理技术
6. Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets [O] . Oluwafemi Oriola, Eduan Kotze 2020

机译：评估机器学习技术，用于检测南非推文中的冒犯和仇恨言论
7. Use of Machine Learning Techniques for Identification of Robust Teleconnections to East African Rainfall Variability. [R] . Roberts, J. B., Robertson, F. R., Funk, C. 2014

机译：利用机器学习技术识别东非降雨量变化的鲁棒遥相关。

Evaluating Machine Learning Techniques for Detecting Offensive and Hate Speech in South African Tweets

摘要

著录项

相似文献

相关主题

期刊订阅