首页> 外文会议>Conference on empirical methods in natural language processing >Improved Arabic Dialect Classification with Social Media Data
【24h】

Improved Arabic Dialect Classification with Social Media Data

机译:使用社交媒体数据改进阿拉伯语方言分类

获取原文

摘要

Arabic dialect classification has been an important and challenging problem for Arabic language processing, especially for social media text analysis and machine translation. In this paper we propose an approach to improving Arabic dialect classification with semi-supervised learning: multiple classifiers are trained with weakly supervised, strongly supervised, and unsupervised data. Their combination yields significant and consistent improvement on two different test sets. The dialect classification accuracy is improved by 5% over the strongly supervised classifier and 20% over the weakly supervised classifier. Furthermore, when applying the improved dialect classifier to build a Modern Standard Arabic (MSA) language model (LM), the new model size is reduced by 70% while the English-Arabic translation quality is improved by 0.6 BLEU point.
机译:阿拉伯语方言分类一直是阿拉伯语言处理(尤其是社交媒体文本分析和机器翻译)中一个重要且具有挑战性的问题。在本文中,我们提出了一种通过半监督学习来改善阿拉伯语方言分类的方法:使用弱监督,强监督和无监督数据训练多个分类器。它们的组合在两个不同的测试集上产生了显着且一致的改进。与强监督分类器相比,方言分类精度提高了5%,而弱监督分类器则提高了20%。此外,当使用改进的方言分类器构建现代标准阿拉伯语(MSA)语言模型(LM)时,新模型的大小减少了70%,而英语-阿拉伯语的翻译质量提高了0.6个BLEU点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号