Faheem at NADI shared task: Identifying the dialect of Arabic tweet

机译：NADI共享任务的Faheeme：识别阿拉伯语推文的方言

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper describes Faheem (adj. of understand),our submission to NADI (Nuanced Arabic Dialect Identification) shared task. With so many Arabic dialects being understudied due to the scarcity of the resources,the objective is to identify the Arabic dialect used in the tweet,at the country-level. We propose a machine learning approach where we utilize word-level n-gram (n = 1 to 3) and tf-idf features and feed them to six different classifiers. We train the system using a data set of 21.000 tweets-provided by the organizers-covering twenty-one Arab countries. Our top performing classifiers are: Logistic Regression,Support Vector Machines,and Multinomial Naive Bayes (MNB). We achieved our best result of macro-F_1 = 0.151 using the MNB classifier.

机译：本文介绍了Faheem（adj。了解），我们向NADI提交（患有细微的阿拉伯语方言识别）共享任务。由于资源稀缺，因此由于资源的稀缺而被解读，目标是在国家一级识别推文中使用的阿拉伯语方言。我们提出了一种机器学习方法，我们利用单词级n-gram（n = 1到3）和TF-IDF特征，并将它们馈送到六个不同的分类器。我们使用组织者提供的21.000推文的数据集培训系统 - 覆盖二十一名阿拉伯国家。我们的顶级表演分类器是：Logistic回归，支持向量机和多项式幼稚贝叶斯（MNB）。我们使用MNB分类器实现了Macro-F_1 = 0.151的最佳结果。

著录项

来源
《Workshop on Arabic Natural Language Processing》|2020年|282-287|共6页
会议地点
作者
Nouf A.Al-Shenaifi; Aqil M.Azmi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:58:19

相似文献

外文文献
中文文献
专利

1. Cross-dialectal data sharing for acoustic modeling in Arabic speech recognition [J] . Kirchhoff K, Vergyri D Speech Communication . 2005,第1期

机译：跨方言数据共享，用于阿拉伯语音识别中的声学建模
2. EveTAR: building a large-scale multi-task test collection over Arabic tweets [J] . Hasanain Maram, Suwaileh Reem, Elsayed Tamer, Information retrieval . 2018,第4期

机译：EveTAR：在阿拉伯语推文上构建大规模的多任务测试集
3. Arabic dialect sentiment analysis with ZERO effort. Case study: Algerian dialect [J] . Imane Guellil, Marcelo Mendoza, Faical Azouaou Inteligencia Artificial : Ibero-American Journal of Artificial Intelligence . 2020,第65期

机译：阿拉伯语方言情绪分析零努力。案例研究：阿尔及利亚方言
4. NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task [C] . Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Workshop on Arabic Natural Language Processing . 2021

机译：NADI 2021：第二个细微的阿拉伯语方言识别共享任务
5. Is Every Tweet Created Equal? A Framework to Identify Relevant Tweets for Business Research [D] . Chee, Thad. 2017

机译：每次推文都是平等的吗？识别企业研究相关推文的框架
6. We tweet Arabic; I tweet English: self-concept language and social media [O] . Justin Thomas, Aamna Al-Shehhi, Marwa Al-Ameri, 2019

机译：我们发布阿拉伯文推文；我发英文：自我概念语言和社交媒体
7. Team JUST at the MADAR Shared Task on Arabic Fine-Grained Dialect Identification [O] . Bashar Talafha, Ali Fadel, Mahmoud Al-Ayyoub, 2019

机译：在马尔的队伍同行任务阿拉伯语细粒度方言鉴定

Faheem at NADI shared task: Identifying the dialect of Arabic tweet

摘要

著录项

相似文献

相关主题

期刊订阅