首页> 外文会议>International Workshop on Semantic Evaluation >NUIG at SemEval-2020 Task 12: Pseudo labelling for offensive content classification
【24h】

NUIG at SemEval-2020 Task 12: Pseudo labelling for offensive content classification

机译:Nuig在Semeval-2020任务12:伪标签用于进攻内容分类

获取原文

摘要

This work addresses the classification problem defined by sub-task A (English only) of the OffensEval 2020 challenge. We used a semi-supervised approach to classify given tweets into an offensive (OFF) or not-offensive (NOT) class. As the OffensEval 2020 dataset is loosely labelled with confidence scores given by unsupervised models, we used last year's offensive language identification dataset (OLID) to label the OffensEval 2020 dataset. Our approach uses a pseudo-labelling method to annotate the current dataset. We trained four text classifiers on the OLID dataset and the classifier with the highest macro-averaged F1-score has been used to pseudo label the OffensEval 2020 dataset. The same model which performed best amongst four text classifiers on OLID dataset has been trained on the combined dataset of OLID and pseudo labelled OffensEval 2020. We evaluated the classifiers with precision, recall and macro-averaged F1-score as the primary evaluation metric on the OLID and OffensEval 2020 datasets.
机译:这项工作解决了offenseVal 2020挑战的子任务A(仅英文)定义的分类问题。我们使用了一个半监督方法来分类给赋予推文,进入令人反感(OFF)或不攻击(不是)课程。由于Iffenseval 2020数据集是由无监督模型给出的置信度分数松散地标记,我们使用了去年的攻击性语言识别数据集(OLID)来标记offenseVal 2020数据集。我们的方法使用伪标记方法来注释当前数据集。我们在OLID数据集上训练了四个文本分类器,并且具有最高宏平均f1分数的分类器已被用于伪标记offenseVal 2020数据集。在OlID数据集上的四个文本分类器中最佳的相同模型已经在Olid和Pseudo标记的offenseVal 2020的组合数据集上培训。我们用精度,召回和宏观平均f1-score评估了分类器作为主要评估度量Olid和Offenseval 2020数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号