首页> 外文会议>Workshop on Arabic Natural Language Processing >Identifying Nuanced Dialect for Arabic Tweets with Deep Learning and Reverse Translation Corpus Extension System
【24h】

Identifying Nuanced Dialect for Arabic Tweets with Deep Learning and Reverse Translation Corpus Extension System

机译:用深入学习和反向翻译语料库扩展系统识别阿拉伯语推文的细节方言

获取原文

摘要

In this paper,we present our work for the NADI Shared Task (Abdul-Mageed et al.,2020): Nuanced Arabic Dialect Identification for Subtask-1: country-level dialect identification. We introduce a Reverse Translation Corpus Extension Systems (RTCES) to handle data imbalance along with reported results on several experimented approaches of word and document representations and different models architectures. The top scoring model was based on the Transformer-based Model for Arabic Language Understanding (AraBERT) (Antoun et al.,2020),with our modified extended corpus based on reverse translation of the given Arabic tweets. The selected system achieved a macro average F1 score of 20.34% on the test set,which places our team CodeLyoko as the 7th out of 18 teams in the final ranking Leaderboard.
机译:在本文中,我们为NADI共享任务提供了我们的工作(Abdul-Mageed等,2020):对次级任务的患者患者的差异阿拉伯语方言鉴定:国家级方言识别。 我们介绍了反向翻译语料库扩展系统(RTCES)以处理数据不平衡以及报告的Word和Document Consignations的几种实验方法和不同型号架构的结果。 顶级评分模型基于用于阿拉伯语理解的基于变压器的模型(阿拉伯语)(Antoun等,2020),我们的修改后的语料库基于给定阿拉伯语推文的反向翻译。 所选系统在测试集中实现了20.34%的宏观平均F1得分,这将我们的团队Codelyoko作为最终排行榜中的18支球队中的7个。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号