【24h】

An Approach for Arabic Diacritization

机译:阿拉伯语二字化方法

获取原文

摘要

Modem Standard Arabic (MSA) contains optional diacritical marks (diacritics, in Arabic harakat), which became less used in Arabic books, newspapers and other written media. Diacritics are very important for readability and understandability of texts. Their absence causes critical problems that add to the lexical, morphological and semantic ambiguities. In this paper, we present an automatic diacritization system of the Arabic language, using Hidden Markov Models with the Viterbi's algorithm, based on probabilities based on learning on diacritized Arabic texts. The corpus used was mostly composed of religious texts. Our results were satisfactory, achieving a precision of up to 80% at the word level.
机译:现代标准阿拉伯语(MSA)包含可选的变音标记(变音符号,阿拉伯语harakat),在阿拉伯书籍,报纸和其他书面媒体中使用较少。变音符号对于文本的可读性和可理解性非常重要。它们的缺失会导致严重的问题,这些问题增加了词汇,形态和语义上的歧义。在本文中,我们基于隐马尔可夫阿拉伯语文本学习的概率,使用带有Viterbi算法的隐马尔可夫模型,提出了一种阿拉伯语自动夸张语言系统。使用的语料库主要由宗教文本组成。我们的结果令人满意,字级精度高达80%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号