...
首页> 外文期刊>Journal of Software Engineering and Applications >A HMM-Based System To Diacritize Arabic Text
【24h】

A HMM-Based System To Diacritize Arabic Text

机译:基于HMM的阿拉伯文字识别系统

获取原文
   

获取外文期刊封面封底 >>

       

摘要

The Arabic language comes under the category of Semitic languages with an entirely different sentence structure in terms of Natural Language Processing. In such languages, two different words may have identical spelling whereas their pronunciations and meanings are totally different. To remove this ambiguity, special marks are put above or below? the spelling characters to determine the correct pronunciation. These marks are called diacritics and the language that uses them is called a diacritized language. This paper presents a system for Arabic language diacritization using Hid- den Markov Models (HMMs). The system employs the renowned HMM Tool Kit? (HTK). Each single diacritic is represented as a separate model. The concatenation of output models is coupled with the input? character sequence to form the fully diacritized text. The performance of the proposed system is assessed using a data corpus that includes more than 24000 sentences.
机译:在自然语言处理方面,阿拉伯语属于闪族语的范畴,句子结构完全不同。在这样的语言中,两个不同的单词可能具有相同的拼写,而它们的发音和含义却完全不同。为了消除这种歧义,在上面或下面放置特殊标记?拼写字符以确定正确的发音。这些标记被称为变音符号,而使用它们的语言则被称为diacritized语言。本文介绍了一种使用隐马尔可夫模型(HMM)进行阿拉伯语双歧化的系统。该系统采用了著名的HMM工具套件? (HTK)。每个单独的变音符号表示为一个单独的模型。输出模型的级联与输入耦合?字符序列,以形成完全纯文本。所建议系统的性能是使用包含24000多个句子的数据语料库进行评估的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号