首页> 外文会议>International conference on recent advances in natural language processing >Automatic diacritization of Tunisian dialect text using Recurrent Neural Network
【24h】

Automatic diacritization of Tunisian dialect text using Recurrent Neural Network

机译:突尼斯方言文本使用经常性神经网络自动禁梦

获取原文
获取外文期刊封面目录资料

摘要

The absence of diacritical marks in the Arabic texts generally leads to morphological, syntactic and semantic ambiguities. This can be more blatant when one deals with under-resourced languages, such as the Tunisian dialect, which suffers from unavailability of basic tools and linguistic resources, like sufficient amount of corpora, multilingual dictionaries, morphological and syntactic analyzers. Thus, this language processing faces greater challenges due to the lack of these resources. The automatic diacritization of MSA text is one of the various complex problems that can be solved by deep neural networks today. Since the Tunisian dialect is an under-resourced language of MSA and as there are a lot of resemblance between both languages, we suggest to investigate a recurrent neural network (RNN) for this dialect diacritization problem. This model will be compared to our previous models models CRF and SMT (24) based on the same dialect corpus. We can experimentally show that our model can achieve better outcomes (DER of 10.72%), as compared to the two models CRF (DER of 20.25%) and SMT (DER of 33.15%).
机译:阿拉伯语文本中没有变形的痕迹通常导致形态,句法和语义歧义。当有一个涉及资源不可用的基本工具和语言资源时,这可能更加明显,如突尼斯方言,这种基本工具和语言资源的不可用,就像足够的基础,多语言词典,形态学和句法分析仪。因此,由于缺乏这些资源,这种语言处理面临更大的挑战。 MSA文本的自动禁梦是今天可以通过深神经网络解决的各种复杂问题之一。由于突尼斯方言是MSA的资源不足语言,因为两种语言之间存在很多相似之处,我们建议调查这种方言复杂问题的经常性神经网络(RNN)。该模型将根据同一方言语料库进行比较我们之前的模型模型CRF和SMT(24)。我们可以通过实验表明,与两种型号(10.25%)和SMT(33.15%)的两种型号相比,我们的模型可以达到更好的结果(10.72%)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号