【24h】

Improved Sentence-Level Arabic Dialect Classification

机译:改进的句子级阿拉伯语方言分类

获取原文
获取原文并翻译 | 示例

摘要

The paper presents work on improved sentence-level dialect classification of Egyptian Arabic (ARZ) vs. Modern Standard Arabic (MSA). Our approach is based on binary feature functions that can be implemented with a minimal amount of task-specific knowledge. We train a feature-rich linear classifier based on a linear support-vector machine (linear SVM) approach. Our best system achieves an accuracy of 89.1 % on the Arabic Online Commentary (AOC) dataset (Zaidan and Callison-Burch, 2011) using 10-fold stratified cross validation: a 1.3 % absolute accuracy improvement over the results published by (Zaidan and Callison-Burch, 2014). We also evaluate the classifier on dialect data from an additional data source. Here, we find that features which measure the informalness of a sentence actually decrease classification accuracy significantly.
机译:本文介绍了改进的埃及阿拉伯语(ARZ)与现代标准阿拉伯语(MSA)的句子级方言分类的工作。我们的方法基于二进制功能,可以用最少的任务特定知识来实现​​。我们基于线性支持向量机(linear SVM)方法训练功能丰富的线性分类器。我们的最佳系统使用10倍分层交叉验证,在阿拉伯语在线评论(AOC)数据集(Zaidan和Callison-Burch,2011)上达到了89.1%的准确度:与(Zaidan和Callison所发表的结果相比,绝对准确度提高了1.3% -Burch,2014年)。我们还将评估来自其他数据源的方言数据的分类器。在这里,我们发现测量句子非正式性的功能实际上会大大降低分类的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号