首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Deep Learning Framework with Confused Sub-Set Resolution Architecture for Automatic Arabic Diacritization
【24h】

Deep Learning Framework with Confused Sub-Set Resolution Architecture for Automatic Arabic Diacritization

机译:带有混淆子集解析架构的深度学习框架,用于自动阿拉伯文数字化

获取原文
获取原文并翻译 | 示例

摘要

The Arabic language belongs to a group of languages that require diacritization over their characters. Modern Standard Arabic (MSA) transcripts omit the diacritics, which are essential for many machine learning tasks like Text-To-Speech (TTS) systems. In this work Arabic diacritics restoration is tackled under a deep learning framework that includes the Confused Sub-set Resolution (CSR) method to improve the classification accuracy, in addition to an Arabic Part-of-Speech (PoS) tagging framework using deep neural nets. Special focus is given to syntactic diacritization, which still suffers low accuracy as indicated in prior works. Evaluation is done versus state-of-the-art systems reported in literature, with quite challenging datasets collected from different domains. Standard datasets like the LDC Arabic Tree Bank are used in addition to custom ones we have made available online to allow other researchers to replicate these results. Results show significant improvement of the proposed techniques over other approaches, reducing the syntactic classification error to 9.9% and morphological classification error to 3% compared to 12.7% and 3.8% of the best reported results in literature, improving the error by 22% over the best reported systems.
机译:阿拉伯语属于一组要求对其字符进行二元化的语言。现代标准阿拉伯语(MSA)抄本省略了变音符号,而变音符号对于许多机器学习任务(如文本转语音(TTS)系统)来说都是必不可少的。在这项工作中,除了使用深度神经网络的阿拉伯语词性(PoS)标记框架外,深度学习框架还解决了阿拉伯语变音符号的恢复问题,该框架包括混淆子集分辨率(CSR)方法以提高分类准确性。 。语法异常锐化特别受到关注,如先前的工作所述,语法锐化仍然存在较低的准确性。与文献中报道的最新系统进行了评估,并从不同领域收集了具有挑战性的数据集。除了我们在线提供的自定义数据集之外,还使用了LDC阿拉伯树库等标准数据集,以允许其他研究人员复制这些结果。结果表明,与其他方法相比,所提出的技术有了显着改进,将句法分类错误降低到9.9%,形态分类错误降低到3%,而文献中最好的报告结果是12.7%和3.8%,与传统方法相比,错误降低了22%。最佳报告系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号