首页> 外文会议>International Workshop on Semantic Evaluation >CoLi at UdS at SemEval-2020 Task 12: Offensive Tweet Detection with Ensembling
【24h】

CoLi at UdS at SemEval-2020 Task 12: Offensive Tweet Detection with Ensembling

机译:在Semeval-2020的Coli在Semeval-2020任务12:令人反感的推文检测与合奏

获取原文

摘要

With today's proliferation of maliciously intended communication across all social media platforms, finding ways of effectively combating these messages grows increasingly important. We present our submission and results for SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) where we participated in offensive tweet classification tasks in English, Arabic, Greek, Turkish and Danish. Our approach included classical machine learning architectures such as support vector machines and logistic regression combined in an ensemble with a multilingual transformer-based model (XLM-R). The transformer model is trained on all languages combined in order to create a fully multilingual model which can leverage knowledge between languages. The machine learning model hyperparameters are fine-tuned and the statistically best performing ones included in the final ensemble. We further discuss the results of our model and see that our broad approach provides competitive but not task-winning performance. We also include an error analysis and potential improvements for future work.
机译:随着在所有社交媒体平台上的恶意沟通的激增,发现有效地打击这些信息的方法越来越重要。我们展示了Semeval-2020任务12的提交和结果:社交媒体中的多语言攻击性语言识别(违法者2020),我们参加了英语,阿拉伯语,希腊语,土耳其语和丹麦语的进攻性推文分类任务。我们的方法包括古典机器学习架构,如支持向量机和Logistic回归组合在具有多语言变换器的模型(XLM-R)的集合中组合。变压器模型在所有语言上培训,以便创建一个完全多语言模型,可以利用语言之间的知识。机器学习模型HyperParameters是微调的,并且在最终集合中包含的统计上最佳性能。我们进一步讨论了我们模型的结果,并了解我们的广泛方法提供了竞争而不是任务胜利的性能。我们还包括未来工作的错误分析和潜在改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号