首页> 外文会议>International Conference on Knowledge and Systems Engineering >From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection

【24h】

From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection

机译：从普通语言模型到下游任务：改善基于Roberta的越南仇恨语音检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Natural language processing (NLP) is a fast-growing field of artificial intelligence. Since the Transformer [32] was introduced by Google in 2017, a large number of language models such as BERT, GPT, and ELMo have been inspired by this architecture. These models were trained on huge datasets and achieved state-of-the-art results on natural language understanding. However, fine-tuning a pre-trained language model on much smaller datasets for downstream tasks requires a carefully-designed pipeline to mitigate problems of the datasets such as lack of training data and imbalanced data. In this paper, we propose a pipeline to adapt the general-purpose RoBERTa language model to a specific text classification task: Vietnamese Hate Speech Detection. We first tune the PhoBERT^{1[9] on our dataset by re-training the model on the Masked Language Model (MLM) task; then, we employ its encoder for text classification. In order to preserve pre-trained weights while learning new feature representations, we further utilize different training techniques: layer freezing, block-wise learning rate, and label smoothing. Our experiments proved that our proposed pipeline boosts the performance significantly, achieving a new state-of-the-art on Vietnamese Hate Speech Detection (HSD) campaign^{2 with 0.7221 F1 score.}}

机译：自然语言处理（NLP）是一种快速增长的人工智能领域。由于谷歌2017年介绍了变压器[32]，因此大量的语言模型如BERT，GPT和ELMO被这种架构的启发。这些模型在巨大的数据集上培训并实现了最先进的自然语言理解。然而，微调在下游任务的小型数据集上进行预先接受的语言模型需要仔细设计的管道，以减轻数据集的问题，例如缺乏训练数据和不平衡数据。在本文中，我们提出了一种管道来使通用罗伯拉语言模型适应特定的文本分类任务：越南仇恨语音检测。我们首先调整phobert^{1 通过在屏蔽语言模型（MLM）任务上重新培训模型，在我们的数据集上;然后，我们使用其编码器进行文本分类。为了在学习新特征表示的同时保留预先训练的权重，我们进一步利用了不同的训练技术：层冻结，块明智的学习率和标签平滑。我们的实验证明，我们的拟议管道显着提高了性能，实现了越南仇恨讲话检测（HSD）运动的新型最先进^{2 0.7221 F1分数。}}

著录项

来源
《International Conference on Knowledge and Systems Engineering 》|2020年|37-42|共6页
会议地点
作者
Quang Huu Pham; Viet Anh Nguyen; Linh Bao Doan; Ngoc N. Tran; Ta Minh Thanh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Voice activity detection; Adaptation models; Pipelines; Text categorization; Natural language processing; Task analysis; Modeling;

机译：语音活动检测;适应模型;管道;文本分类;自然语言处理;任务分析;建模;

相似文献

外文文献
中文文献
专利

1. Comparing pre-trained language models for Spanish hate speech detection [J] . Miriam Plaza-del-Arco Flor, Dolores Molina-Gonzalez M., Alfonso Urena-Lopez L., Expert systems with applications . 2021 ,第Mara期

机译：比较预先培训的语言模型，用于西班牙语仇恨语音检测
2. Automatic hate speech detection using killer natural language processing optimizing ensemble deep learning approach [J] . Al-Makhadmeh Zafer, Tolba Amr Computing . 2020 ,第2期

机译：使用杀手级自然语言处理的自动仇恨语音检测优化集成深度学习方法
3. Social Network Hate Speech Detection for Amharic Language [J] . Zewdie Mossie, Jenq-Haur Wang Computer Science & Information Technology . 2018 ,第6期

机译：阿姆哈拉语社交网络仇恨语音检测
4. Comparison Between Traditional Machine Learning Models And Neural Network Models For Vietnamese Hate Speech Detection [C] . Son T. Luu, Hung P. Nguyen, Kiet Van Nguyen, RIVF International Conference on Computing and Communication Technologies . 2020

机译：越南仇恨语音检测的传统机器学习模型与神经网络模型的比较
5. On the Detection of Hate Speech, Hate Speakers and Polarized Groups in Online Social Media [D] . Warmsley, Dana. 2017

机译：在线社交媒体中仇恨言论，仇恨演说者和两极分化群体的检测
6. UNBNLP at SemEval-2019 Task 5 and 6: Using Language Models to Detect Hate Speech and Offensive Language [O] . Ali Hakimi Parizi, Milton King, Paul Cook 2019

机译：UNBNLP在Semeval-2019任务5和6：使用语言模型来检测仇恨语音和攻击性语言

From Universal Language Model to Downstream Task: Improving RoBERTa-Based Vietnamese Hate Speech Detection

摘要

著录项

相似文献

相关主题

期刊订阅