Automatic Arabic Dialect Classification Using Deep Learning Models

Leena Lulu; Ashraf Elnagar

首页> 外文期刊>Procedia Computer Science >Automatic Arabic Dialect Classification Using Deep Learning Models

【24h】

Automatic Arabic Dialect Classification Using Deep Learning Models

机译：使用深度学习模型自动进行阿拉伯语方言分类

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, the vast use of social media and the high availability of internet access have produced a considerably different textual data from the formal and standard data on the Web. This includes various Arabic dialectal languages, which are the native spoken languages of Arabic speakers. The presence of textual Arabic dialectal languages on the Web has brought many new opportunities as well as challenges for machine learning and Arabic language processing. The identification of this type of informal data has its crucial effect on several applications such as sentiment analysis and machine translation. However, the standard NLP tools developed for traditional data fall short due to nature of dialectal textual data. Deep learning tools have proven to be very effective in processing social Media dialectal text. In this paper, we consider a variety of deep learning models for the automatic classification of Arabic dialectal text. We use a free large manually-annotated dataset known as Arabic Online Commentary (AOC), which includes several Dialectal Arabic (DA) along with the Modern Standard Arabic (MSA), [3]. We consider the most frequent dialects in the dataset. Namely, the Egyptian (EGP), Levantine (LEV), and Gulf –including Iraqi - (GLF). Four different deep neural network models have been implemented to examine the Arabic dialectal classification problem for each pair of the 3 dialects (binary classification experiments) as well as one ternary-classification experiment including all dialects together. The results show a varying but promising performance of the models for each pair of dialects. Furthermore, a closer examination on the manually-annotated AOC dataset has been carried out and hence, we conclude that there is a serious demand for a thorough refinement and review of the AOC annotated sentences as it is an important benchmark dataset in the field.

机译：最近，社交媒体的广泛使用和互联网的高可用性已经产生了与Web上的正式数据和标准数据截然不同的文本数据。其中包括各种阿拉伯方言语言，它们是阿拉伯语使用者的母语。文本阿拉伯方言语言在网络上的存在为机器学习和阿拉伯语言处理带来了许多新机遇，也带来了挑战。这类非正式数据的识别对情感分析和机器翻译等多种应用具有至关重要的作用。但是，由于方言文本数据的性质，为传统数据开发的标准NLP工具不足。事实证明，深度学习工具在处理社交媒体方言文本方面非常有效。在本文中，我们考虑了各种深度学习模型，用于阿拉伯方言文本的自动分类。我们使用一个免费的大型手动注释数据集，称为阿拉伯在线注释（AOC），其中包括几种方言阿拉伯语（DA）以及现代标准阿拉伯语（MSA），[3]。我们考虑数据集中最常用的方言。即，埃及（EGP），黎凡特（LEV）和海湾地区-包括伊拉克-（GLF）。已经实现了四种不同的深度神经网络模型，以检查3种方言中每对的阿拉伯语方言分类问题（二进制分类实验）以及一个包括所有方言的三元分类实验。结果表明，每对方言模型的性能各不相同，但前景看好。此外，已经对人工注释的AOC数据集进行了更深入的检查，因此，我们得出结论，由于它是该领域的重要基准数据集，因此迫切需要对AOC注释的句子进行彻底的改进和审查。

著录项

来源
《Procedia Computer Science》 |2018年第1期|共8页
作者
Leena Lulu; Ashraf Elnagar;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
deep learning modelsclassificationArabic dialectsAOC dataset;

机译：深度学习模型分类阿拉伯方言AOC数据集;

相似文献

外文文献
中文文献
专利

1. Gender identification for Egyptian Arabic dialect in twitter using deep learning models [J] . Shereen ElSayed, Mona Farouk Egyptian Informatics Journal . 2020,第3期

机译：埃及阿拉伯语方言的性别识别使用深度学习模型
2. Effective Deep Learning Models for Automatic Diacritization of Arabic Text [J] . Mokthar Ali Hasan Madhfar, Ali Mustafa Qamar Quality Control, Transactions . 2021,第1期

机译：用于阿拉伯文自动禁证的有效深度学习模型
3. Arabic text classification using deep learning models [J] . Elnagar Ashraf, Al-Debsi Ridhwan, Einea Omar Information Processing & Management . 2020,第1期

机译：使用深度学习模型进行阿拉伯文字分类
4. Country-level Arabic Dialect Identification Using Small Datasets with Integrated Machine Learning Techniques and Deep Learning Models [C] . Maha J. Althobaiti Workshop on Arabic Natural Language Processing . 2021

机译：国家一级的阿拉伯语方言识别，使用小型数据集具有集成机器学习技术和深度学习模型
5. Morphological Tagging and Disambiguation in Dialectal Arabic Using Deep Learning Architectures [D] . Zalmout, Nasser . 2020

机译：使用深度学习架构的语言阿拉伯语中的形态标记和歧义
6. A Neural Machine Translation Model for Arabic Dialects That Utilises Multitask Learning (MTL) [O] . Laith H. Baniata, Seyoung Park, Seong-Bae Park 2018

机译：利用多任务学习（MTL）的阿拉伯语神经机器翻译模型
7. Automatic Arabic Dialect Classification [O] . Esra J., Abdul-kareem A. 2017

机译：自动阿拉伯语方言分类
8. Accurate Arabic Script Language/Dialect Classification. [R] . S. C. Tratz 2014

机译：准确的阿拉伯语脚本语言/方言分类。

Automatic Arabic Dialect Classification Using Deep Learning Models

摘要

著录项

相似文献

相关主题

期刊订阅