QCRI @ DSL 2016: Spoken Arabic Dialect Identification Using Textual Features

机译：QCRI @ DSL 2016：使用文本功能识别阿拉伯语方言

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The paper describes the QCRI submissions to the shared task of automatic Arabic dialect classification into 5 Arabic variants, namely Egyptian, Gulf, Levantine, North-African (Maghrebi), and Modern Standard Arabic (MSA). The relatively small training set is automatically generated from an ASR system. To avoid over-fitting on such small data, we selected and designed features that capture the morphological essence of the different dialects. We submitted four runs to the Arabic sub-task. For all runs, we used a combined feature vector of character bigrams, trigrams, 4-grams, and 5-grams. We tried several machine-learning algorithms, namely Logistic Regression, Naive Bayes, Neural Networks, and Support Vector Machines (SVM) with linear and string kernels. Our submitted runs used SVM with a linear kernel. In the closed submission, we got the best accuracy of 0.5136 and the third best weighted Fl score, with a difference of less than 0.002 from the best system.

机译：本文介绍了QCRI提交给自动阿拉伯语方言分类的共同任务，将其分为5种阿拉伯语变体，即埃及语，海湾语，黎凡特语，北非语（Maghrebi）和现代标准阿拉伯语（MSA）。相对较小的训练集是从ASR系统自动生成的。为了避免在如此小的数据上过度拟合，我们选择并设计了可捕捉不同方言形态本质的特征。我们向阿拉伯语子任务提交了四次运行。对于所有运行，我们使用字符双字母组，三字母组，4克和5克的组合特征向量。我们尝试了几种机器学习算法，分别是Logistic回归，朴素贝叶斯，神经网络和带有线性和字符串内核的支持向量机（SVM）。我们提交的运行使用具有线性内核的SVM。在封闭式提交中，我们获得的最佳准确度为0.5136，而加权Fl得分排在第三位，与最佳系统的差异小于0.002。

著录项

来源
《Workshop on NLP for similar languages, varieties and dialects》|2016年|221-226|共6页
会议地点
作者
Mohamed Eldesouki; Fahim Dalvi; Hassan Sajjad; Kareem Darwish;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Prosody-based Spoken Algerian Arabic Dialect Identification [J] . Soumia Bougrine, Hadda Cherroun, Djelloul Ziadi Procedia Computer Science . 2018,第1期

机译：基于韵律的口语阿尔及利亚方言识别
2. Spoken Arabic dialect recognition using X-vectors [J] . Abualsoud Hanani, Rabee Naser Natural language engineering . 2020,第Pta6期

机译：使用X-Vectors口头的阿拉伯语方言识别
3. Gender identification for Egyptian Arabic dialect in twitter using deep learning models [J] . Shereen ElSayed, Mona Farouk Egyptian Informatics Journal . 2020,第3期

机译：埃及阿拉伯语方言的性别识别使用深度学习模型
4. QCRI @ DSL 2016: Spoken Arabic Dialect Identification Using Textual Features [C] . Mohamed Eldesouki, Fahim Dalvi, Hassan Sajjad, Workshop on NLP for similar languages, varieties and dialects . 2016

机译：QCRI @ DSL 2016：使用文本功能说话的阿拉伯语方言识别
5. Dialect influence and the use of dialect features across informal and formal tasks in the spoken text and written text of African American students enrolled in an urban high school. [D] . Finizio, Maria Teresa. 2001

机译：在城市高中就读的非裔美国人的口语和书面文字中，方言的影响和方言在非正式和正式任务中的使用。
6. Morphological structure in the Arabic mental lexicon: Parallels between standard and dialectal Arabic [O] . Sami Boudelaa, William D. Marslen-Wilson -1

机译：阿拉伯语心理词典中的形态结构：标准阿拉伯语与方言阿拉伯语之间的平行
7. MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge [O] . Shon, Suwon, Ali, Ahmed, Glass, James 2017

机译：mIT-QCRI 2017年多种类型的阿拉伯语方言识别系统广播挑战赛

QCRI @ DSL 2016: Spoken Arabic Dialect Identification Using Textual Features

摘要

著录项

相似文献

相关主题

期刊订阅