首页> 外文期刊>Computer speech and language >Text-to-speech synthesis system with Arabic diacritic recognition system
【24h】

Text-to-speech synthesis system with Arabic diacritic recognition system

机译:带有阿拉伯音素识别系统的文本到语音合成系统

获取原文
获取原文并翻译 | 示例
           

摘要

Text-to-speech synthesis system has been widely studied for many languages. However, speech synthesis for Arabic language has not sufficient progresses and it is still in its first stage. Statistical parametric synthesis based on hidden Markov models was the most commonly applied approach for Arabic language. Recently, synthesized speech quality based on deep neural networks was found as intelligible as human voice. This paper describes a Text-To-Speech (TTS) synthesis system for modern standard Arabic language based on statistical parametric approach and Mel-cepstral coefficients. Deep neural networks achieved state-of-the-art performance in a wide range of tasks, including speech synthesis. Our TTS system includes a diacritization system which is very important for Arabic TTS application. Our diacritization system is also based on deep neural networks. In addition to the use deep techniques, different methods were also proposed to model the acoustic parameters in order to address the problem of acoustic models accuracy. They are based on linguistic and acoustic characteristics (e.g. letter position based diacritization system, unit types based synthesis system, diacritic marks based synthesis system) and based on deep learning techniques (stacked generalization techniques). Experimental results show that our diacritization system can generate a diacritized text with high accuracy. As regards the speech synthesis system, the experimental results and subjective evaluation show that our proposed method for synthesis system can generate intelligible and natural speech.
机译:文字转语音合成系统已被广泛研究用于多种语言。但是,阿拉伯语语音合成技术还没有取得足够的进展,仍处于起步阶段。基于隐马尔可夫模型的统计参数综合是阿拉伯语言最常用的方法。最近,发现基于深度神经网络的合成语音质量与人类语音一样可理解。本文介绍了一种基于统计参数方法和梅尔倒谱系数的现代标准阿拉伯语文本转语音(TTS)合成系统。深度神经网络在包括语音合成在内的各种任务中均实现了最先进的性能。我们的TTS系统包括一个消音系统,这对于阿拉伯语TTS的应用非常重要。我们的二元化系统也基于深度神经网络。除了使用深度技术外,还提出了多种方法来对声学参数进行建模,以解决声学模型准确性的问题。它们基于语言和声学特征(例如,基于字母位置的变音编码系统,基于单元类型的合成系统,基于变音符号的合成系统)以及基于深度学习技术(堆叠泛化技术)。实验结果表明,我们的双歧化系统可以生成高精度的双歧化文本。对于语音合成系统,实验结果和主观评价表明,本文提出的合成系统方法能够产生清晰自然的语音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号