F_0 Modeling for Isarn Speech Synthesis using Deep Neural Networks and Syllable-level Feature Representation

Janyoi Pongsathon; Seresangtakul Pusadee

首页> 外文期刊>The international arab journal of information technology >F_0 Modeling for Isarn Speech Synthesis using Deep Neural Networks and Syllable-level Feature Representation

【24h】

F_0 Modeling for Isarn Speech Synthesis using Deep Neural Networks and Syllable-level Feature Representation

机译：使用深神经网络和音节级特征表示，ISARN语音合成的F_0建模

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The generation of the fundamental frequency (F-0) plays an important role in speech synthesis, which directly influences the naturalness of synthetic speech. In conventional parametric speech synthesis, F-0 is predicted frame-by-frame. This method is insufficient to represent F-0 contours in larger units, especially tone contours of syllables in tonal languages that deviate as a result of long-term context dependency. This work proposes a syllable-level F-0 model that represents F-0 contours within syllables, using syllable-level F-0 parameters that comprise the sampling F-0 points and dynamic features. A Deep Neural Network (DNN) was used to represent the relationships between syllable-level contextual features and syllable-level F-0 parameters. The proposed model was examined using an Isarn speech synthesis system with both large and small training sets. For all training sets, the results of objective and subjective tests indicate that the proposed approach outperforms the baseline systems based on hidden Markov models and DNNS that predict F-0 values at the frame level.

机译：基本频率（F-0）的产生在语音合成中起重要作用，这直接影响了合成语音的自然性。在传统的参数语音合成中，F-0被逐帧预测。该方法不足以表示较大单位的F-0轮廓，特别是以长期上下文依赖性偏离的音节的音节的音调轮廓。这项工作提出了一个音节级F-0模型，它代表了音节内的F-0轮廓，使用了包括采样F-0点和动态特征的音节级F-0参数。深度神经网络（DNN）用于表示音节级上下文特征和音节级F-0参数之间的关系。通过具有大小训练集的ISARN语音合成系统检查所提出的模型。对于所有培训集，客观和主观测试的结果表明，所提出的方法超越基于隐马尔可夫模型和DNN的基线系统，该模型可以在帧级别预测F-0值。

著录项

来源
《The international arab journal of information technology》 |2020年第6期|906-915|共10页
作者
Janyoi Pongsathon; Seresangtakul Pusadee;
展开▼
作者单位

Khon Kaen Univ Dept Comp Sci Nat Language & Speech Proc Lab Khon Kaen Thailand;

Khon Kaen Univ Dept Comp Sci Fac Sci Khon Kaen Thailand;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Fundamental frequency; speech synthesis; deep neural networks;

机译：基频;语音合成;深神经网络;

相似文献

外文文献
中文文献
专利

1. Feature joint-state posterior estimation in factorial speech processing models using deep neural networks [J] . Khademian Mandi, Homayounpour Mohammad Mehdi Computers and Electrical Engineering . 2017,第期

机译：使用深神经网络的阶乘语音处理模型中的关节状态后估计
2. Visual features versus categories: Explaining object representations in primate IT and deep neural networks with weighted representational modeling [J] . Kamila Jozwik, Nikolaus Kriegeskorte, Radoslaw Cichy, Journal of vision . 2016,第12期

机译：视觉特征与类别：使用加权表示模型解释灵长类IT和深度神经网络中的对象表示
3. Visual features versus categories: Explaining object representations in primate IT and deep neural networks with weighted representational modeling [J] . Kamila Jozwik, Nikolaus Kriegeskorte, Radoslaw Cichy, Journal of vision . 2016,第12期

机译：视觉特征与类别：使用加权表示模型解释灵长类IT和深度神经网络中的对象表示
4. Syllable-level representations of suprasegmental features for DNN-based text-to-speech synthesis [C] . Manuel Sam Ribeiro, Oliver Watts, Junichi Yamagishi Annual Conference of the International Speech Communication Association . 2016

机译：基于DNN的文本到语音合成的Suprace段特征的音节级表示
5. Prediction of Electronic Component Prices:from Classical Statistical and Machine Learning Models to Deep Neural Networks with Feature Embedding [D] . Zhang, Yu. 2019

机译：电子零件价格的预测：从经典的统计和机器学习模型到具有特征嵌入的深度神经网络
6. A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification [O] . Yunchuan Kong, Tianwei Yu -1

机译：利用随机森林提取特征表示的深度神经网络模型用于基因表达数据分类
7. Tonal Contour Generation for Isarn Speech Synthesis Using Deep Learning and Sampling-Based F0 Representation [O] . Pongsathon Janyoi, Pusadee Seresangtakul 2020

机译：使用深度学习和基于采样的F0表示的Isarn语音合成的音调轮廓生成

F_0 Modeling for Isarn Speech Synthesis using Deep Neural Networks and Syllable-level Feature Representation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅