首页> 外文会议>Workshop on Arabic natural language processing >SHAKKIL: An Automatic Diacritization System for Modern Standard Arabic Texts
【24h】

SHAKKIL: An Automatic Diacritization System for Modern Standard Arabic Texts

机译:SHAKKIL:现代标准阿拉伯文字的自动数字化系统

获取原文

摘要

This paper sheds light on a system that would be able to diacritize Arabic texts automatically (SHAKKIL). In this system, the diacritization problem will be handled through two levels; morphological and syntactic processing levels. The adopted morphological disambiguation algorithm depends on four layers; Uni-morphological form layer, rule-based morphological disambiguation layer, statistical-based disambiguation layer and Out Of Vocabulary (OOV) layer. The adopted syntactic disambiguation algorithms is concerned with detecting the case ending diacritics depending on a rule based approach simulating the shallow parsing technique. This will be achieved using an annotated corpus for extracting the Arabic linguistic rules, building the language models and testing the system output. This system is considered as a good trial of the interaction between rule-based approach and statistical approach, where the rules can help the statistics in detecting the right diacritization and vice versa. At this point, the morphological Word Error Rate (WER) is 4.56% while the morphological Diacritic Error Rate (DER) is 1.88% and the syntactic WER is 9.36%. The best WER is 14.78% compared to the best-published results, of (Abandah et al., 2015); 11.68%, (Rashwan et al., 2015); 12.90% and (Habash et al., 2009); 13.60%.
机译:本文阐明了一种系统,该系统将能够自动区分阿拉伯文本(SHAKKIL)。在该系统中,双尖锐化问题将通过两个级别进行处理:形态和句法加工水平。所采用的形态学消歧算法取决于四层。单形态形式层,基于规则的形态学消歧层,基于统计的消歧层和词库外(OOV)层。所采用的句法消除歧义算法涉及根据模拟浅层解析技术的基于规则的方法来检测案例结尾变音符号。这将通过使用带注释的语料库来提取阿拉伯语言规则,构建语言模型并测试系统输出来实现。该系统被认为是基于规则的方法与统计方法之间的交互的良好尝试,其中的规则可以帮助统计数据检测正确的双眼畸形,反之亦然。此时,形态词错误率(WER)为4.56%,而形态音素错误率(DER)为1.88%,句法WER为9.36%。与最佳结果相比,最佳WER为14.78%(Abandah等人,2015); 11.68%(Rashwan等人,2015); 12.90%和(Habash et al。,2009); 13.60%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号