Neural network-based F0 text-to-speech synthesiser for Mandarin

Hwang S.-H.; Chen S.-H.

首页> 外文期刊>IEE Proceedings. Part K >Neural network-based F0 text-to-speech synthesiser for Mandarin

【24h】

Neural network-based F0 text-to-speech synthesiser for Mandarin

机译：基于神经网络的普通话F0语音合成器

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

A neural-network-based approach to synthesising F0 information for Mandarin text-to-speech is discussed. The basic idea is to use neural networks to model the relationship between linguistic features. Extracted from input text and parameters representing the pitch contour of syllables. Two MLPs are used to separately synthesise the mean and shape of pitch contour, using different linguistic features. A large set of utterances is employed to train these MLPs using the well known back-propagation algorithm. Pronunciation rules for generating F0 information are automatically learned and implicitly memorised by the MLPs. In the synthesis, parameters representing the mean and shape of the pitch contour of each syllable are generated using linguistic features extracted from the given input text. Simulation results confirmed that this is a promising approach for F0 synthesis. The resulting synthesised pitch contours of syllables match well with their original counterparts. Average root mean square errors of 0.94 ms/frame and 1.00 ms/frame were achieved.

机译：讨论了一种基于神经网络的普通话语音合成F0信息的方法。基本思想是使用神经网络来建模语言特征之间的关系。从输入文本和代表音节音高轮廓的参数中提取。两个MLP用于使用不同的语言特征分别合成音高轮廓的平均值和形状。使用众所周知的反向传播算法，使用大量话语来训练这些MLP。 MLP自动学习并隐式存储用于生成F0信息的发音规则。在合成中，使用从给定输入文本中提取的语言特征来生成代表每个音节音高轮廓的平均值和形状的参数。仿真结果证实，这是用于F0合成的有前途的方法。音节的合成音高等高线与其原始音节非常吻合。实现了0.94 ms /帧和1.00 ms /帧的平均均方根误差。

著录项

来源
《IEE Proceedings. Part K》 |1994年第6期|P.384-390|共7页
作者
Hwang S.-H.; Chen S.-H.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类电声技术和语音信号处理;电工技术;
关键词

相似文献

外文文献
中文文献
专利

1. Neural network synthesiser of pause duration for Mandarin text-to-speech [J] . Shaw-Hwa Hwang, Sin-Horng Chen Electronics Letters . 1992,第8期

机译：普通话语音转换暂停时间的神经网络合成器
2. F0 Contour Modeling for Arabic Text-to-Speech Synthesis Using Fujisaki Parameters and Neural Networks [J] . Fatouma Boukadida, Noureddine Ellouze, Zied Mnasri Signal Processing: An International Journal . 2011,第6期

机译：使用Fujisaki参数和神经网络的F0轮廓建模，用于阿拉伯文本到语音的合成
3. High-Quality Prosody Generation in Mandarin Text-to-Speech System [J] . Qing Guo, Jie Zhang, Nobuyuki Katae, Fujitsu Scientific & Technical Journal . 2010,第1期

机译：普通话语音合成系统中的高质量韵律生成
4. A first study on neural net based generation of prosodic and spectral information for Mandarin text-to-speech [C] . Sin-Horng Chen, Shaw-Hwa Hwang . 1992

机译：基于神经网络的普通话转语音的韵律和频谱信息生成的初步研究
5. Cognition Modulates Neural Responsiveness During Voluntary Voice F0 Control [D] . Atkins, Christopher 2012

机译：自愿语音F0控制过程中的认知调节神经反应。
6. Effect of F0 contour on perception of Mandarin Chinese speech against masking [O] . Meihong Wu 2012

机译：F0等高线对普通话语音掩蔽感知的影响
7. Do Text-to-Speech Synthesisers Pronounce Correctly? A Preliminary Study [O] . D. G. Evans, E. A. Draffan, A. James, 2015

机译：文本到语音合成器是否正确发音？初步研究
8. Text-To-Speech Phrasing Enhancement System Using Neural Networks [R] . Julig, L. F. 1995

机译：基于神经网络的文本语音语音增强系统

Neural network-based F0 text-to-speech synthesiser for Mandarin

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅