UnibucKernel Reloaded: First Place in Arabic Dialect Identification for the Second Year in a Row

机译：重装Unibuc内核：连续第二年在阿拉伯方言识别中排名第一

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a machine learning approach that ranked on the first place in the Arabic Dialect Identification (ADI) Closed Shared Tasks of the 2018 VarDial Evaluation Campaign. The proposed approach combines several kernels using multiple kernel learning. While most of our kernels are based on character p-grams (also known as n-grams) extracted from speech or phonetic transcripts, we also use a kernel based on dialectal embeddings generated from audio recordings by the organizers. In the learning stage, we independently employ Kernel Discriminant Analysis (KDA) and Kernel Ridge Regression (KRR). Preliminary experiments indicate that KRR provides better classification results. Our approach is shallow and simple, but the empirical results obtained in the 2018 ADI Closed Shared Task prove that it achieves the best performance. Furthermore, our top macro-F_1 score (58.92%) is significantly better than the second best score (57.59%) in the 2018 ADI Shared Task, according to the statistical significance test performed by the organizers. Nevertheless, we obtain even better post-competition results (a macro-F_1 score of 62.28%) using the audio embeddings released by the organizers after the competition. With a very similar approach (that did not include phonetic features). we also ranked first in the ADI Closed Shared Tasks of the 2017 VarDial Evaluation Campaign, surpassing the second best method by 4.62%. We therefore conclude that our multiple kernel learning method is the best approach to date for Arabic dialect identification.

机译：我们提出了一种机器学习方法，该方法在2018 VarDial评估活动的阿拉伯方言识别（ADI）封闭共享任务中排名第一。所提出的方法使用多个内核学习来结合多个内核。虽然我们的大多数内核都是基于从语音或语音记录中提取的字符p-gram（也称为n-gram），但我们也使用基于组织者从录音中生成的方言嵌入的内核。在学习阶段，我们独立使用核判别分析（KDA）和核岭回归（KRR）。初步实验表明，KRR提供了更好的分类结果。我们的方法虽然肤浅且简单，但是在2018年ADI封闭式共享任务中获得的经验结果证明，它可以实现最佳性能。此外，根据组织者进行的统计显着性检验，我们的最高宏观F_1得分（58.92％）明显优于2018 ADI共享任务中的第二最高得分（57.59％）。但是，使用比赛后组织者发布的音频嵌入，我们可以获得更好的比赛后结果（F_1宏得分为62.28％）。使用非常相似的方法（不包括语音功能）。在2017年VarDial评估活动的ADI封闭式共享任务中，我们也排名第一，比第二名的方法高4.62％。因此，我们得出结论，我们的多核学习方法是迄今为止阿拉伯方言识别的最佳方法。

著录项

来源
《Fifth workshop on NLP for similar langues, varieties and dialects》|2018年|77-87|共11页
会议地点 Santa Fe(US)
作者
Andrei M. Butnaru; Radu Tudor Ionescu;
展开▼
作者单位

Department of Computer Science, University of Bucharest 14 Academiei, Bucharest, Romania;

Department of Computer Science, University of Bucharest 14 Academiei, Bucharest, Romania;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Gender identification for Egyptian Arabic dialect in twitter using deep learning models [J] . Shereen ElSayed, Mona Farouk Egyptian Informatics Journal . 2020,第3期

机译：埃及阿拉伯语方言的性别识别使用深度学习模型
2. Word-Level vs Sentence-Level Language Identification: Application to Algerian and Arabic Dialects [J] . Mohamed Lichouri, Mourad Abbas, Abed Alhakim Freihat, Procedia Computer Science . 2018,第22期

机译：单词级与句子级语言识别：应用于阿尔及利亚和阿拉伯方言
3. Prosody-based Spoken Algerian Arabic Dialect Identification [J] . Soumia Bougrine, Hadda Cherroun, Djelloul Ziadi Procedia Computer Science . 2018,第1期

机译：基于韵律的口语阿尔及利亚方言识别
4. UnibucKernel Reloaded: First Place in Arabic Dialect Identification for the Second Year in a Row [C] . Andrei M. Butnaru, Radu Tudor Ionescu Workshop on NLP for similar langues, varieties and dialects . 2018

机译：Unibuckernel重新加载：连续第二年的阿拉伯语方言识别第一名
5. Arabic Dialect Identification [D] . Al-Mannai, Kamela Ali 2018

机译：阿拉伯方言识别
6. Morphological structure in the Arabic mental lexicon: Parallels between standard and dialectal Arabic [O] . Sami Boudelaa, William D. Marslen-Wilson -1

机译：阿拉伯语心理词典中的形态结构：标准阿拉伯语与方言阿拉伯语之间的平行
7. Hierarchical Deep Learning for Arabic Dialect Identification [O] . Gael de Francony, Victor Guichard, Praveen Joshi, 2019

机译：阿拉伯语方言识别的分层深度学习

UnibucKernel Reloaded: First Place in Arabic Dialect Identification for the Second Year in a Row

摘要

著录项

相似文献

相关主题

期刊订阅