A Review of Standard Text Classification Practices for Multi-label Toxicity Identification of Online Content

机译：关于在线内容的多标签毒性识别标准文本分类实践的审查

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Language toxicity identification presents a gray area in the ethical debate surrounding freedom of speech and censorship. Today's social media landscape is littered with unfiltered content that can be anywhere from slightly abusive to hate inducing. In response, we focused on training a multi-label classifier to detect both the type and level of toxicity in online content. This content is typically colloquial and conversational in style. Its classification therefore requires huge amounts of annotated data due to its variability and inconsistency. We compare standard methods of text classification in this task. A conventional one-vs-rest SVM classifier with character and word level frequency-based representation of text reaches 0.9763 ROC AUC score. We demonstrated that leveraging more advanced technologies such as word embeddings, recurrent neural networks, attention mechanism, stacking of classifiers and semi-supervised training can improve the ROC AUC score of classification to 0.9862. We suggest that in order to choose the right model one has to consider the accuracy of models as well as inference complexity based on the application.

机译：语言毒性识别在围绕言论自由和审查自由的道德辩论中呈现出灰色区域。今天的社交媒体景观与未过滤的内容乱丢，这些内容可以是任何地方，从略微辱骂以仇恨诱导。作为响应，我们专注于培训多标签分类器，以检测在线内容中的毒性类型和级别。这种内容通常是口语和型风格的对话。因此，由于其变异性和不一致，其分类需要大量的注释数据。我们在此任务中比较文本分类的标准方法。具有字符和字级基于频率的文本表示的传统单VS-REST SVM分类器达到0.9763 Roc AUC分数。我们证明，利用更先进的技术，如Word Embeddings，经常性神经网络，注意机制，堆叠分类器和半监督培训可以将Roc Auc分数的分类提高到0.9862。我们建议，为了选择正确的模型，必须考虑模型的准确性以及根据应用程序的推论复杂性。

著录项

来源
《Conference on empirical methods in natural language processing》|2018年|xiii 170 p.|共5页
会议地点
作者
Isuru Gunasekara; Isar Nejadgholi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
入库时间 2022-08-20 23:27:42

相似文献

外文文献
中文文献
专利

1. Multi-label Arabic text classification in Online Social Networks [J] . Omar Ahmed, Mahmoud Tarek M., Abd-El-Hafeez Tarek, Information Systems . 2021,第Sepa期

机译：在线社交网络中的多标签阿拉伯文文本分类
2. Online multi-label dependency topic models for text classification [J] . Burkhardt Sophie, Kramer Stefan Machine Learning . 2018,第5期

机译：用于文本分类的在线多标签依赖项主题模型
3. Multi-label classification and knowledge extraction from oncology-related content on online social networks [J] . Hashemi Mahdi, Hall Margeret Artificial Intelligence Review: An International Science and Engineering Journal . 2020,第8期

机译：从在线社交网络上的初学相关内容的多标签分类和知识提取
4. A Review of Standard Text Classification Practices for Multi-label Toxicity Identification of Online Content [C] . Isuru Gunasekara, Isar Nejadgholi Second workshop on abusive language online 2018 . 2018

机译：在线内容多标签毒性识别的标准文本分类实践的回顾
5. Leveraging Label Information in Representation Learning for Multi-Label Text Classification [D] . Wu, Jiayu 2019

机译：在表示学习中利用标签信息进行多标签文本分类
6. Automatic topic identification of health-related messages in online health community using text classification [O] . Yingjie Lu -1

机译：使用文本分类自动识别在线健康社区中与健康相关的消息的主题
7. Online multi-label dependency topic models for text classification [O] . Sophie Burkhardt, Stefan Kramer 2017

机译：文本分类的在线多标签依赖项主题模型

A Review of Standard Text Classification Practices for Multi-label Toxicity Identification of Online Content

摘要

著录项

相似文献

相关主题

期刊订阅