Prosody-Enhanced Mandarin Text-to-Speech System

机译：韵律增强的汉语文语转换系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The end-to-end Text-to-Speech (TTS), which can generate speech directly from a given sequence of graphemes or phonemes, has shown superior performance over the conventional TTS. It has been able to generate high-quality speech, but it is still unable to control the local prosody such as word-level emphasis. Although the prominence of synthesized speech can be adjusted by explicit prosody tags, the acquisition of such tags is often time-consuming and laborious. This paper focuses on a deep neural prominence prediction module, using Continuous Wavelet Transform (CWT) to analyze the prosodic signal of input data, get the corresponding continuous prominence values of Chinese characters in the text to guide the training of a prominence prediction network, so that it can realize the mapping from the input text to the corresponding prominence value of each Chinese character in the text. The proposed method does not need to label the training data manually, so a fully automatic prosody control system is realized. Experiments show that the proposed system can generate more natural and expressive speech.

机译：端到端文本到语音（TTS）可以直接从给定的字母或音素序列生成语音，与传统的TTS相比表现出了优越的性能。它已经能够生成高质量的语音，但仍然无法控制局部韵律，例如单词级的强调。虽然合成语音的显著性可以通过显式韵律标记进行调整，但此类标记的获取往往耗时费力。本文重点研究了一种深度神经显著性预测模块，利用连续小波变换（CWT）对输入数据的韵律信号进行分析，得到文本中汉字对应的连续显著性值，以指导显著性预测网络的训练，从而实现从输入文本到文本中每个汉字对应的突出值的映射。该方法不需要人工标注训练数据，实现了一个全自动的韵律控制系统。实验表明，该系统能产生更自然、更具表现力的语音。

著录项

来源
《International Conference on Advances in Computer Technology, Information Science and Communication》|2021年|67-71|共5页
会议地点
作者
Fang Niu; Wushour Silamu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Information science; Continuous wavelet transforms; Training data; Tools; Wavelet analysis; Control systems;

机译：训练信息科学;连续小波变换;培训数据;工具;小波分析;控制系统;

相似文献

外文文献
中文文献
专利

1. Pitch models of Mandarin text-to-speech [J] . SHAO Yan-qiu, SUI Zhi-fang, HAN Ji-qing 哈尔滨工业大学学报（英文版） . 2009,第002期
2. A HMM-based Mandarin Chinese Singing Voice Synthesis System [J] . Xian Li, Zengfu Wang 自动化学报：英文版 . 2016,第002期
3. A HMM-based Mandarin Chinese Singing Voice Synthesis System [J] . Xian Li, Zengfu Wang 自动化学报（英文版） . 2016,第002期
4. Nonlinear Time-Frequency Distributions of Spectrum Energy Operator in Large Vocabulary Mandarin Speaker Independent Speech Recognition System [J] . Fadhil H. T. Al-dulaimy, WANG Zuoying(王作英) 清华大学学报（英文版） . 2003,第006期
5. High-Quality Prosody Generation in Mandarin Text-to-Speech System [J] . Qing Guo, Jie Zhang, Nobuyuki Katae, Fujitsu Scientific & Technical Journal . 2010,第1期

机译：普通话语音合成系统中的高质量韵律生成
6. A novel prosody adaptation method for Mandarin concatenation-based text-to-speech system [J] . Jian Yu, Jianhua Tao Acoustical science and technology . 2009,第1期

机译：基于普通话串联的语篇转换系统的新韵律调适方法
7. A novel prosody adaptation method for Mandarin concatenation-based text-to-speech system [J] . Jian Yu, Jianhua Tao Acoustical science and technology . 2009,第1期

机译：基于普通话串联的语篇转换系统的新韵律调适方法
8. Stress predicition for Mandarin text-to-speech system using discourse context feature [C] . Che Hao, Tao Jianhua 2013 International Conference on Oriental COCOSDA . 2013

机译：语篇上下文特征对普通话语音转换系统的压力预测
9. Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system. [D] . Yoon, Kyuchul. 2005

机译：为韩国文字转语音合成系统建立一个对韵律敏感的diphone数据库。
10. Adaptive and Longitudinal Pharmaceutical Care Instruction Using an Interactive Voice Response/Text-to-Speech System [O] . Gamal Hussein, Nancy Kawahara 2006

机译：使用交互式语音应答/文本语音转换系统的自适应和纵向药物护理指导
11. A novel prosody adaptation method for Mandarin concatenation-based text-to-speech system [O] . Jian Yu, Jianhua Tao 2009

机译：基于普通话级联的文本到语音系统的一种新型韵律适应方法
12. Text-To-Speech Phrasing Enhancement System Using Neural Networks [R] . Julig, L. F. 1995

机译：基于神经网络的文本语音语音增强系统

Prosody-Enhanced Mandarin Text-to-Speech System

摘要

著录项

相似文献

相关主题

期刊订阅