...
首页> 外文期刊>Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies >Generation of suprasegmental information for speech using a recurrent neural network and binary gravitational search algorithm for feature selection
【24h】

Generation of suprasegmental information for speech using a recurrent neural network and binary gravitational search algorithm for feature selection

机译:使用递归神经网络和用于特征选择的二进制重力搜索算法生成语音的超分段信息

获取原文
获取原文并翻译 | 示例

摘要

Suprasegmental (prosody) features of discourse provide a vehicle by which speakers reflect their mental purposes to listeners. Generating suitable prosody information is critical to expressing messages and improving the intelligibility and naturalness of synthetic speech. Generic prosody generators should provide information about pitch frequency (F0) contours, energy levels, word durations, and inter-word pause durations for speech synthesizers. The present study used a recurrent neural network (RNN) for prosody generation. The inputs of this RNN were word-level and syllable-level linguistic features. To provide data efficiently for the RNN-based prosody generator in the training, validation, and test phases, automatic segmentation and labeling of phonemes were performed. The number of inputs to the RNN was reduced by employing a binary gravitational search algorithm (BGSA) for feature selection (FS). The proposed prosody generator provided 12 output prosodic parameters for the current syllable for representing pitch contour, log-energy contour, inter-syllable pause duration, duration of syllable, duration of the vowel in the syllable, and vowel onset time. Experimental results demonstrated the success of the RNN-based prosody generator in synthesizing the six prosodic elements with acceptable root mean square error (RMSE). By using a BGSA-based FS unit, a lighter neural model was achieved with a 53 % reduction in the number of weight connections, producing RMSEs with acceptable degradation over the no-FS unit prosody generator. The performance of the BGSA-based FS method was compared with a binary particle swarm optimization (BPSO) algorithm, and the BGSA showed slightly better results. A modified mean opinion score scale was used to evaluate the intelligibility and naturalness of synthesized speech using the proposed method.
机译:话语的节段性(韵律)特征为说话者向听众反映其心理目的提供了一种手段。产生合适的韵律信息对于表达信息和提高合成语音的清晰度和自然性至关重要。通用韵律生成器应提供有关语音合成器的基音频率(F0)轮廓,能量水平,单词持续时间和单词间停顿持续时间的信息。本研究使用递归神经网络(RNN)进行韵律生成。该RNN的输入是单词级和音节级的语言功能。为了在训练,验证和测试阶段为基于RNN的韵律生成器有效提供数据,执行了音素的自动分段和标记。通过使用用于特征选择(FS)的二进制重力搜索算法(BGSA),减少了RNN的输入数量。拟议的韵律发生器为当前音节提供12个输出韵律参数,用于表示音高轮廓,对数能量轮廓,音节间停顿持续时间,音节持续时间,音节中元音的持续时间以及元音开始时间。实验结果证明了基于RNN的韵律生成器在合成六个具有均方根误差(RMSE)的韵律元素方面的成功。通过使用基于BGSA的FS单元,实现了较轻的神经模型,重量连接数减少了53%,从而产生了与no-FS单元韵律生成器相比可接受的降级的RMSE。将基于BGSA的FS方法的性能与二进制粒子群优化(BPSO)算法进行了比较,并且BGSA的结果略好。使用改进的平均意见得分量表,使用所提出的方法评估合成语音的清晰度和自然性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号