首页> 外文期刊>International Journal of Computing and Information Sciences >Automatic Diacritics Restoration for Dialectal Arabic Text
【24h】

Automatic Diacritics Restoration for Dialectal Arabic Text

机译:方言阿拉伯文字的变音符号自动恢复

获取原文
           

摘要

In this paper, the problem of missing diacritic marks in most of dialectal Arabic written resources is addressed. Our aim is to implement a scalable and extensible platform for automatically retrieving the diacritic marks for undiacritized dialectal Arabic texts. Different rule-based and statistical techniques are proposed. These include: maximum likelihood estimate, and statistical n-gram models. The proposed platform includes helper tools for text pre-processing and encoding conversion. Diacritization accuracy of each technique is evaluated in terms of Diacritic Error Rate (DER) and Word Error Rate (WER). The approach trains several n-gram models on different lexical units. A data pool of both Modern Standard Arabic (MSA) data along with Dialectal Arabic data was used to train the models.
机译:在本文中,解决了大多数方言阿拉伯文字资源中的变音符号丢失的问题。我们的目标是实现一个可扩展和可扩展的平台,以自动检索未发音的方言阿拉伯文字的变音符号。提出了不同的基于规则的统计技术。其中包括:最大似然估计和统计n-gram模型。提议的平台包括用于文本预处理和编码转换的辅助工具。根据变音符号错误率(DER)和字错误率(WER)评估每种技术的双歧化准确性。该方法在不同的词汇单元上训练几个n元语法模型。同时使用现代标准阿拉伯语(MSA)数据和方言阿拉伯语数据的数据库来训练模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号