首页> 外文OA文献 >Reducing out-of-vocabulary in morphology to improve the accuracy in Arabic dialects speech recognition
【2h】

Reducing out-of-vocabulary in morphology to improve the accuracy in Arabic dialects speech recognition

机译:减少形态学上的词汇量,提高阿拉伯方言语音识别的准确性

摘要

This thesis has two aims: developing resources for Arabic dialects and improving the speech recognition of Arabic dialects. Two important components are considered: Pronunciation Dictionary (PD) and Language Model (LM). Six parts are involved, which relate to building and evaluating dialects resources and improving the performance of systems for the speech recognition of dialects. ududThree resources are built and evaluated: one tool and two corpora. The methodology that was used for building the multi-dialect morphology analyser involves the proposal and evaluation of linguistic and statistic bases. We obtained an overall accuracy of 94%. The dialect text corpora have four sub-dialects, with more than 50 million tokens. The multi-dialect speech corpora have 32 speech hours, which were collected from 52 participants. The resultant speech corpora have more than 67,000 speech files. ududThe main objective is improvement in the PDs and LMs of Arabic dialects. The use of incremental methodology made it possible to check orthography and phonology rules incrementally. We were able to distinguish the rules that positively affected the PDs. The Word Error Rate (WER) improved by an accuracy of 5.3% in MSA and 5% in Levantine. ududThree levels of morphemes were used to improve the LMs of dialects: stem, prefix+stem and stem+suffix. We checked the three forms using two different types of LMs. Eighteen experiments are carried out on MSA, Gulf dialect and Egyptian dialect, all of which yielded positive results, showing that WERs were reduced by 0.5% to 6.8%.ud
机译:本文的目的主要有两个:开发阿拉伯方言资源和改善阿拉伯方言的语音识别能力。考虑了两个重要的组成部分:发音词典(PD)和语言模型(LM)。涉及六个部分,它们涉及建立和评估方言资源以及改善方言语音识别系统的性能。 ud ud构建并评估了三种资源:一种工具和两种语料库。用于构建多方言形态分析器的方法涉及语言和统计基础的建议和评估。我们获得了94%的整体准确性。方言文本语料库有四个子方言,带有超过5000万个令牌。多方言语料库有32个演讲小时,是从52个参与者那里收集的。最终的语音语料库拥有67,000多个语音文件。 ud ud主要目标是改善阿拉伯方言的PD和LM。使用增量方法可以逐步检查拼字法和语音规则。我们能够区分出对PD产生积极影响的规则。 MSA的字错误率(WER)的准确性提高了5.3%,黎凡特提高了5%。 ud ud三种语素被用来改善方言的LM:词干,前缀+词干和词干+后缀。我们使用两种不同类型的LM检查了这三种形式。在MSA,海湾方言和埃及方言上进行了18个实验,所有这些都产生了积极的结果,表明WER降低了0.5%至6.8%。

著录项

  • 作者

    Almeman Khalid Abdulrahman;

  • 作者单位
  • 年度 2015
  • 总页数
  • 原文格式 PDF
  • 正文语种 English
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号