首页> 外文会议>International Workshop on Semantic Evaluation >Ferryman at SemEval-2020 Task 12: BERT-Based Model with Advanced Improvement Methods for Multilingual Offensive Language Identification

【24h】

Ferryman at SemEval-2020 Task 12: BERT-Based Model with Advanced Improvement Methods for Multilingual Offensive Language Identification

机译：Semeval-2020的渡轮任务12：基于BERT的模型，具有多语言攻击性语言识别的高级改进方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Indiscriminately posting offensive remarks on social media may promote the occurrence of negative events such as violence, crime, and hatred. This paper examines different approaches and models for solving offensive tweet classification, which is a part of the OffensEval 2020 competition(Zampieri et al., 2020; Zampieri et al., 2019b). The dataset is Offensive Language Identification Dataset (OLID)(Zampieri et al., 2019a), which draws 14,200 annotated English Tweet comments(Rosenthal et al., 2020). The main challenge of data preprocessing is the unbalanced class distribution, abbreviation, and emoji. To overcome these issues, methods such as hashtag segmentation, abbreviation replacement, and emoji replacement have been adopted for data preprocessing approaches. The main task can be divided into three sub-tasks, and are solved by Term Frequency-Inverse Document Frequency(TF-IDF) vectorizer, Bidirectional Encoder Representation from Transformer (BERT), and Multi-dropout respectively. Meanwhile, we applied different learning rates for different languages and tasks based on BERT and non-BERTmodels in order to obtain better results. Our team Ferryman ranked the 18th, 8th, and 21st with F1-score of 0.91152 on the English Sub-task A, Sub-task B, and Sub-task C, respectively. Furthermore, our team also ranked in the top 20 on the Sub-task A of other languages(Coeltekin, 2020; Sigurbergsson and Derczynski, 2020; Mubarak et al., 2020; Pitenis et al., 2020).

机译：在社交媒体上不分青红皂白地发布令人反感言论可能会促进暴力，犯罪和仇恨等负面事件的发生。本文研究了解决进攻性推文分类的不同方法和模型，这是违法的2020年竞赛的一部分（Zampieri等，2020; Zampieri等，2019b）。 DataSet是令人攻击的语言识别数据集（OLID）（Zampieri等，2019A），它绘制了14,200名注释的英语推文评论（Rosenthal等，2020）。数据预处理的主要挑战是不平衡的类分布，缩写和表情符号。为了克服这些问题，已经采用了数据预处理方法，诸如HASHTAG分割，缩写替代和表情歌曲替换等方法。主要任务可以分为三个子任务，并且通过术语频率 - 逆文档频率（TF-IDF）矢量化器，来自变压器（BERT）的双向编码器表示和多丢失来解决。同时，我们基于BERT和非BERTMODELS应用不同语言和任务的不同学习率，以获得更好的结果。我们的团队渡轮分别在英语子任务A，子任务B和子任务C上排名第18，第8和第21次，为0.91152的F1分数。此外，我们的团队还在其他语言的子任务A（Coeltekin，2020; Sigurbergsson和Derczynski，2020; Mubarak等，2020）上的前20名。，2020; Pitenis等，2020）。

著录项

来源
《International Workshop on Semantic Evaluation》|2020年|1947-1952|共6页
会议地点
作者
Weilong Chen; Peng Wang; Jipeng Li; Yuanshuai Zheng; Yan Wang; Yanru Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. An Experimental Comparison of Modeling Techniques and Combination of Speaker - Specific Information from Different Languages for Multilingual Speaker Identification [J] . H. S. Jayanna, B. G. Nagaraja Journal of Intelligent Systems . 2016,第4期

机译：多种语言的说话人识别的建模技术和来自不同语言的说话人特定信息组合的实验比较
2. Phone Clustering Methods for Multilingual Language Identification [J] . Ronny Mabokela Computer Science & Information Technology . 2020,第14期

机译：用于多语言语言识别的手机聚类方法
3. Composing a narrative story in a third language: multilinguals' reliance on multiple languages in an L3 linguistic task [J] . Pap Emese Boksay International Journal of Bilingual Education and Bilingualism . 2016,第2期

机译：用第三种语言撰写一个叙事故事：在三级语言中，多语种对多种语言的依赖
4. UPB at SemEval-2020 Task 12: Multilingual Offensive Language Detection on Social Media by Fine-tuning a Variety of BERT-based Models [C] . Mircea-Adrian Tanase, Dumitru-Clementin Cercel, Costin-Gabriel Chiru International Workshop on Semantic Evaluation . 2020

机译：在Semeval-2020的UPB任务12：通过微调各种基于BERT的型号来社交媒体对社交媒体的多语言攻击性语言检测
5. Model updating and structural health monitoring of horizontal axis wind turbines via advanced spinning finite elements and stochastic subspace identification methods. [D] . Velazquez Hernandez, Antonio. 2014

机译：通过先进的旋转有限元和随机子空间识别方法对水平轴风力发电机组进行模型更新和结构健康监测。
6. Using Nonword Repetition Tasks for the Identification of Language Impairment in Spanish-English Speaking Children: Does the Language of Assessment Matter? [O] . Vera F. Gutiérrez-Clellen, Gabriela Simon-Cereijido -1

机译：使用非词重复任务为语言障碍的西班牙英语母语儿童的鉴别：评估是否物质的语言？
7. LaSTUS/TALN at SemEval-2019 Task 6: Identification and Categorization of Offensive Language in Social Media with Attention-based Bi-LSTM model [O] . Lutfiye Seda Mut Altin, Àlex Bravo Serrano, Horacio Saggion 2019

机译：Lastus / Taln在Semeval-2019任务6：基于注意力的Bi-LSTM模型的社交媒体中的识别和分类

Ferryman at SemEval-2020 Task 12: BERT-Based Model with Advanced Improvement Methods for Multilingual Offensive Language Identification

摘要

著录项

相似文献

相关主题

期刊订阅