首页> 外文会议>International joint conference on natural language processing >A System for Diacritizing Four Varieties of Arabic
【24h】

A System for Diacritizing Four Varieties of Arabic

机译:一种既成三种阿拉伯语制定的系统

获取原文

摘要

Short vowels, aka diacritics, are more often omitted when writing different varieties of Arabic including Modern Standard Arabic (MSA), Classical Arabic (CA), and Dialectal Arabic (DA). However, diacritics are required to properly pronounce words, which makes diacritic restoration (a.k.a. diacritization) essential for language learning and text-to-speech applications. In this paper, we present a system for diacritizing MSA, CA, and two varieties of DA, namely Moroccan and Tunisian. The system uses a character level sequence-to-sequence deep learning model that requires no feature engineering and beats all previous SOTA systems for all the Arabic varieties that we test on.
机译:在编写不同品种的阿拉伯语包括现代标准阿拉伯语(MSA),古典阿拉伯语(CA)和方言阿拉伯语(DA)时,更常常省略短元音。但是,既需要又需要弄错的词,这使得单词恢复(A.K.A.Anycritity)对语言学习和文本到语音应用程序所必需的。在本文中,我们提出了一种制定MSA,CA和两种DA,即摩洛哥和突尼斯的系统的系统。该系统使用特征工程的字符级序列到序列深度学习模型,并为我们测试的所有阿拉伯品种击败所有先前的SOTA系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号