首页> 外文期刊>Information Processing & Management >Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions
【24h】

Improved Arabic speech recognition system through the automatic generation of fine-grained phonetic transcriptions

机译:通过自动生成细粒度的语音转录来改进阿拉伯语语音识别系统

获取原文
获取原文并翻译 | 示例

摘要

This paper aims at determining the best way to exploit the phonological properties of the Arabic language in order to improve the performance of the speech recognition system. One of the main challenges facing the processing of Arabic is the effect of the local context, which induces changes in the phonetic representation of a given text, thereby causing the recognition engine to misclassify it. The proposed solution is to develop a set of language-dependent grapheme-to-allophone rules that can predict such allophonic variations and hence provide a phonetic transcription that is sensitive to the local context for the automatic speech recognition system. The novel aspect of this method is that the pronunciation of each word is extracted directly from a context-sensitive phonetic transcription rather than a predefined dictionary that typically does not reflect the actual pronunciation of the word. The paper also aims at employing the stress feature as one of the supra-segmental characteristics of speech to enhance the acoustic modelling. The effectiveness of applying the proposed rules has been tested by comparing the performance of a dictionary based system against one using the automatically generated phonetic transcription. The research reported an average of 9.3% improvement in the system's performance by eliminating the fixed dictionary and using the generated phonetic transcription to learn the phone probabilities. Marking the stressed vowels with separate stress markers leads to a further improvement of 1.7%. (C) 2017 Elsevier Ltd. All rights reserved.
机译:本文旨在确定开发阿拉伯语言语音特性的最佳方法,以提高语音识别系统的性能。阿拉伯语处理面临的主要挑战之一是当地环境的影响,这会导致给定文本的语音表示形式发生变化,从而导致识别引擎将其错误分类。提出的解决方案是开发一组语言相关的音素到音素规则,该规则可以预测这样的同音异形变化,从而为自动语音识别系统提供对本地上下文敏感的语音转录。此方法的新颖之处在于,每个单词的发音都是直接从上下文相关的语音转录中提取的,而不是通常不会反映单词实际发音的预定义词典。本文还旨在将重音特征作为语音的超分段特征之一,以增强声学建模。通过比较基于字典的系统与使用自动生成的语音转录的字典系统的性能,已经测试了应用建议规则的有效性。研究报告称,通过消除固定词典并使用生成的语音转录来学习电话概率,该系统的性能平均提高了9.3%。用单独的应力标记标记应力元音会进一步提高1.7%。 (C)2017 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号