Modeling prosody patterns for Chinese expressive text-to-speech synthesis

机译：汉语表达性文本到语音合成的韵律模式建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes an approach for modeling the prosody patterns of the acoustic features for Chinese expressive text-to-speech (TTS) synthesis. Based on the observation that the speaker usually tends to put more emphasis on one particular syllable within a multi-syllabic prosodic word, we identify such syllable as the core syllable that can be derived from the semantic stress and tone information of the text prompt. We then classify the syllables in speech into four classes, based on their relations with the core syllable in a prosodic word. We analyze the contrastive (neutral versus expressive) speech recordings for each of four classes, and develop a perturbation model that takes into account the prosody pattern to transform neutral speech to expressive speech. Perceptual experiments on both neutral speech recordings and neutral TTS outputs involving 13 subjects indicate that the proposed approach can significantly enhance expressivity in synthesizing expressive speech.

机译：本文提出了一种方法来建模中文表达文本语音转换（TTS）的声学特征的韵律模式。基于这样的观察，说话者通常倾向于将重点放在多音节韵律词中的一个特定音节上，因此我们将这种音节确定为可以从文本提示的语义重音和音调信息中得出的核心音节。然后，根据语音与音节中核心音节的关系，将语音中的音节分为四类。我们分析了四个类别中每个类别的对比（中性与表达性）语音记录，并开发了一种考虑了韵律模式将中性语音转换为表达性语音的摄动模型。对涉及13个主题的中性语音记录和中性TTS输出的感知实验表明，所提出的方法可以在合成表达性语音中显着提高表达性。

著录项

来源
《2010 7th International Symposium on Chinese Spoken Language Processing》|2010年|p.148-152|共5页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词
expressive text-to-speech (TTS); non-linear perturbaton model; prosody pattern;

机译：表达文本语音转换（TTS）;非线性perturbaton模型;韵律模式;

相似文献

外文文献
中文文献
专利

1. Modeling the Expressivity of Input Text Semantics for Chinese Text-to-Speech Synthesis in a Spoken Dialog System [J] . Zhiyong Wu, Meng H.M., Hongwu Yang, Audio, Speech, and Language Processing, IEEE Transactions on . 2009,第8期

机译：语音对话系统中中文文本到语音合成的输入文本语义表达建模
2. Cross-Dialect Adaptation Framework for Constructing Prosodic Models for Chinese Dialect Text-to-Speech Systems [J] . Chen-Yu Chiang Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2018,第1期

机译：跨方言适应框架构建汉语方言文本语音系统的韵律模型
3. Prosody modeling for syllable based text-to-speech synthesis using feedforward neural networks [J] . Reddy V. Ramu, Rao K. Sreenivasa Neurocomputing . 2016,第JANa1期

机译：使用前馈神经网络进行基于音节的语音合成的韵律建模
4. Modeling prosody patterns for Chinese expressive text-to-speech synthesis [C] . {missing} International Symposium on Chinese Spoken Language Processing . 2010

机译：效仿韵律模式的中国富有表达文字致辞综合
5. Building a prosodically sensitive diphone database for a Korean text-to-speech synthesis system. [D] . Yoon, Kyuchul. 2005

机译：为韩国文字转语音合成系统建立一个对韵律敏感的diphone数据库。
6. Voice Quality Modelling for Expressive Speech Synthesis [O] . Carlos Monzo, Ignasi Iriondo, Joan Claudi Socoró -1

机译：表达语音合成的语音质量建模
7. Modeling Prosody Patterns for Chinese Expressive Text-to-Speech Synthesis [O] . Zhiyong Wu, Lianhong Cai, Helen M. Meng 2015

机译：中文表达文本到语音合成的韵律模式建模

Modeling prosody patterns for Chinese expressive text-to-speech synthesis

摘要

著录项

相似文献

相关主题

期刊订阅