首页> 外文期刊>Journal of VLSI signal processing systems >Investigating Deep Neural Network Adaptation for Generating Exclamatory and Interrogative Speech in Mandarin
【24h】

Investigating Deep Neural Network Adaptation for Generating Exclamatory and Interrogative Speech in Mandarin

机译:研究深度神经网络的适应性,以产生普通话中的感叹词和疑问句

获取原文
获取原文并翻译 | 示例

摘要

Currently, most speech synthesis systems only generate speech in a reading style, which greatly affects the expressiveness of the synthetized speech. To improve the expressiveness of the synthetized speech, this paper focuses on the generation of exclamatory and interrogative speech for Mandarin spoken language. A multi-style (exclamatory and interrogative) deep neural network-based acoustic model with a style-specific layer (which can have multiple layers) and several shared hidden layers is proposed. The style-specific layer is used to model the distinct style specific patterns. The shared layers allow maximum knowledge sharing between the declarative and multi-style speech. We investigate five major aspects of the multi-style adaptation: neural network type and topology, the number of layers in style-specific layer, initial model, adaptation parameters and adaptation corpus size. Both objective and subjective evaluations are carried out to evaluate the proposed method. Experiment results show the proposed multi-style BLSTM with top one layer adapted is superior to our prior work (which is trained by the combination of constrained Maximum likelihood linear regression and structural maximum a posterior), and achieves the best performance. We also find that adapting on both spectral and excitation parameters are more effective than only adapting on the excitation parameters.
机译:当前,大多数语音合成系统仅以阅读方式生成语音,这极大地影响了合成语音的表达能力。为了提高合成语音的表达能力,本文重点介绍了普通话口语中感叹性和疑问性语音的产生。提出了一种基于多样式(感叹和疑问)深度神经网络的声学模型,该模型具有特定样式的图层(可以具有多个图层)和几个共享的隐藏图层。特定于样式的层用于对不同的特定于样式的模式进行建模。共享层允许在声明性语音和多样式语音之间实现最大程度的知识共享。我们研究了多样式适应的五个主要方面:神经网络类型和拓扑,特定样式层中的层数,初始模型,适应参数和适应语料库大小。进行客观和主观评估以评估所提出的方法。实验结果表明,提出的多层顶层改进的BLSTM优于我们先前的工作(受约束的最大似然线性回归和后验结构最大值的组合训练),并取得了最佳性能。我们还发现,同时适应频谱和激发参数比仅适应激发参数更有效。

著录项

  • 来源
    《Journal of VLSI signal processing systems》 |2018年第7期|1039-1052|共14页
  • 作者单位

    National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences Recognition,School of Artificial Intelligence, University of Chinese Academy of Sciences;

    National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences Recognition;

    National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences Recognition;

    National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences Recognition;

    National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences Recognition,School of Artificial Intelligence, University of Chinese Academy of Sciences,CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Automation, Chinese Academy of Science;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Speech synthesis; Excitation parameters; Deep neural network adaptation; Exclamatory speech; Interrogative speech;

    机译:语音合成;激励参数;深度神经网络自适应;感叹词;疑问句;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号