NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection with Cross-lingual Transfer

机译：在Semeval-2020任务12的NLPDOVE：通过交叉传输提高令人反感的语言检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes our approach to the task of identifying offensive languages in a multilingual setting. We investigate two data augmentation strategies: using additional semi-supervised labels with different thresholds and cross-lingual transfer with data selection. Leveraging the semi-supervised dataset resulted in performance improvements compared to the baseline trained solely with the manually-annotated dataset. We propose a new metric. Translation Embedding Distance, to measure the transferability of instances for cross-lingual data selection. We also introduce various preprocessing steps tailored for social media text along with methods to fine-tune the pre-trained multilingual BERT (mBERT) for offensive language identification. Our multilingual systems achieved competitive results in Greek, Danish, and Turkish at OffensEval 2020.

机译：本文介绍了我们在多语言环境中识别冒犯性语言的任务的方法。我们调查了两个数据增强策略：使用具有不同阈值的额外半监督标签以及使用数据选择的交叉传输。利用半监控数据集导致性能改进与仅与手动注释的数据集接受的基线相比。我们提出了一个新的指标。翻译嵌入距离，测量跨语言数据选择的实例的可转换性。我们还介绍了社交媒体文本量身定制的各种预处理步骤，以及用于微调预先训练的多语言BERT（MBERT）的方法，以进行攻击性语言识别。我们的多语言系统在违法者2020年实现了希腊，丹麦语和土耳其语的竞争结果。

著录项

来源
《International Workshop on Semantic Evaluation》|2020年|1576-1586|共11页
会议地点
作者
Hwijeen Ahn; Jimin Sun; Chan Young Park; Jungyun Seo;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Investigating cross-lingual training for offensive language detection [J] . Andra? Pelicon, Ravi Shekhar, Bla? ?krlj, PeerJ Computer Science . 2021,第a期

机译：调查攻击性语言检测的交叉思考
2. Multi-Level Cross-Lingual Transfer Learning With Language Shared and Specific Knowledge for Spoken Language Understanding [J] . He Keqing, Xu Weiran, Yan Yuanmeng Quality Control, Transactions . 2020,第期

机译：具有语言共享的多层次交叉传输学习和语言理解的特定知识
3. Learning cross-lingual phonological and orthagraphic adaptations: a case study in improving neural machine translation between low-resource languages [J] . Saurav Jha, Akhilesh Sudhakar, Anil Kumar Singh Journal of Language Modelling . 2019,第2期

机译：学习跨语言的语音和拼字法适应：改进低资源语言之间的神经机器翻译的案例研究
4. Team Rouges at SemEval-2020 Task 12: Cross-lingual Inductive Transfer to Detect Offensive Language [C] . Tanvi Dadu, Kartikey Pant International Workshop on Semantic Evaluation . 2020

机译：Semeval-2020的团队凿鲁贝任务12：交叉舌诱导转移以检测冒犯性语言
5. Cross-Lingual Transfer of Natural Language Processing Systems [D] . Rasooli, Mohammad Sadegh. 2019

机译：自然语言处理系统的跨语言传输
6. Natural Language Processing Improves Detection of Nonsevere Hypoglycemia in Medical Records Versus Coding Alone in Patients With Type 2 Diabetes but Does Not Improve Prediction of Severe Hypoglycemia Events: An Analysis Using the Electronic Medical Record in a Large Health System [O] . Anita D. Misra-Hebert, Alex Milinovich, Alex Zajichek, 2020

机译：自然语言处理可改善2型糖尿病患者病历中非严重低血糖的检测但不能改善严重低血糖事件的预测：大型医疗系统中使用电子病历的分析
7. NULI at SemEval-2019 Task 6: Transfer Learning for Offensive Language Detection using Bidirectional Transformers [O] . Ping Liu, Wen Li, Liang Zou 2019

机译：Nuli在Semeval-2019任务6：使用双向变压器转移学习攻击语言检测

NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection with Cross-lingual Transfer

摘要

著录项

相似文献

相关主题

期刊订阅