【24h】

Prosody-Enhanced Mandarin Text-to-Speech System

机译:韵律增强的汉语文语转换系统

获取原文

摘要

The end-to-end Text-to-Speech (TTS), which can generate speech directly from a given sequence of graphemes or phonemes, has shown superior performance over the conventional TTS. It has been able to generate high-quality speech, but it is still unable to control the local prosody such as word-level emphasis. Although the prominence of synthesized speech can be adjusted by explicit prosody tags, the acquisition of such tags is often time-consuming and laborious. This paper focuses on a deep neural prominence prediction module, using Continuous Wavelet Transform (CWT) to analyze the prosodic signal of input data, get the corresponding continuous prominence values of Chinese characters in the text to guide the training of a prominence prediction network, so that it can realize the mapping from the input text to the corresponding prominence value of each Chinese character in the text. The proposed method does not need to label the training data manually, so a fully automatic prosody control system is realized. Experiments show that the proposed system can generate more natural and expressive speech.
机译:端到端文本到语音(TTS)可以直接从给定的字母或音素序列生成语音,与传统的TTS相比表现出了优越的性能。它已经能够生成高质量的语音,但仍然无法控制局部韵律,例如单词级的强调。虽然合成语音的显著性可以通过显式韵律标记进行调整,但此类标记的获取往往耗时费力。本文重点研究了一种深度神经显著性预测模块,利用连续小波变换(CWT)对输入数据的韵律信号进行分析,得到文本中汉字对应的连续显著性值,以指导显著性预测网络的训练,从而实现从输入文本到文本中每个汉字对应的突出值的映射。该方法不需要人工标注训练数据,实现了一个全自动的韵律控制系统。实验表明,该系统能产生更自然、更具表现力的语音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号