Median-based generation of synthetic speech durations using a non-parametric approach

机译：使用非参数方法基于中位数的合成语音持续时间生成

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper proposes a new approach to duration modelling for statistical parametric speech synthesis in which a recurrent statistical model is trained to output a phone transition probability at each timestep (acoustic frame). Unlike conventional approaches to duration modelling - which assume that duration distributions have a particular form (e.g., a Gaussian) and use the mean of that distribution for synthesis - our approach can in principle model any distribution supported on the non-negative integers. Generation from this model can be performed in many ways; here we consider output generation based on the median predicted duration. The median is more typical (more probable) than the conventional mean duration, is robust to training-data irregularities, and enables incremental generation. Furthermore, a frame-level approach to duration prediction is consistent with a longer-term goal of modelling durations and acoustic features together. Results indicate that the proposed method is competitive with baseline approaches in approximating the median duration of held-out natural speech.

机译：本文提出了一种用于统计参数语音合成的持续时间建模的新方法，其中训练了循环统计模型以在每个时间步（声学帧）输出电话转移概率。与传统的持续时间建模方法不同-假设持续时间分布具有特定形式（例如高斯分布）并使用该分布的平均值进行合成-与我们的方法不同，我们的方法原则上可以对非负整数支持的任何分布进行建模。此模型的生成可以通过多种方式执行。在这里，我们考虑基于预测中值持续时间的输出生成。中位数比常规的平均持续时间更典型（更可能），对训练数据不规则具有鲁棒性，并且可以进行增量生成。此外，持续时间预测的帧级方法与将持续时间和声学特征一起建模的长期目标是一致的。结果表明，所提出的方法在估计保留的自然语音的中位数持续时间方面与基线方法相比具有竞争优势。

著录项

来源
《IEEE Workshop on Spoken Language Technology》|2016年|686-692|共7页
会议地点
作者
Srikanth Ronanki; Oliver Watts; Simon King; Gustav Eje Henter;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hidden Markov models; Acoustics; Predictive models; Speech; Speech synthesis; Natural languages; Pragmatics;

机译：隐马尔可夫模型;声学;预测模型;语音;语音合成;自然语言;语用学;

相似文献

外文文献
中文文献
专利

1. Non-Parametric Duration Modelling for Speech Synthesis with a Joint Model of Acoustics and Duration [J] . Gustav Eje HENTER, Srikanth RONANKI, Oliver WATTS, 電子情報通信学会技術研究報告. 音声. Speech . 2016,第414期

机译：语音和持续时间联合模型用于语音合成的非参数持续时间建模
2. A profile-free non-parametric approach towards generation of synthetic hourly global solar irradiation data from daily totals [J] . Hassan Muhammed A., Abubakr Mohamed, Khalil Adel Renewable energy . 2021,第Apra期

机译：自日间总数的自由型非参数探讨生成合成小时全球太阳辐照数据
3. Bayesian non-parametric generation of fully synthetic multivariate categorical data in the presence of structural zeros [J] . Manrique-Vallier Daniel, Hu Jingchen Journal of the Royal Statistical Society . 2018,第pta3期

机译：存在结构零的贝叶斯非参数全合成多元分类数据生成
4. Median-based generation of synthetic speech durations using a non-parametric approach [C] . Srikanth Ronanki, Oliver Watts, Simon King, IEEE Workshop on Spoken Language Technology . 2016

机译：使用非参数方法的基于中位数的合成语音持续时间
5. An organic chemistry approach toward the synthesis of valuable biological compounds: Synthetic progress toward the Palmerolide A subunits, expeditious enyne coupling via alkynes, and development of the next generation of HIV-1 integrase inhibitors. [D] . Hadi, Victor. 2010

机译：合成有价值的生物化合物的有机化学方法：合成Palmerolide A亚基的合成进展，通过炔烃快速进行的炔烃偶联以及下一代HIV-1整合酶抑制剂的开发。
6. Generation of Synthetic Chest X-ray Images and Detection of COVID-19: A Deep Learning Based Approach [O] . Yash Karbhari, Arpan Basu, Zong Woo Geem, 2021

机译：生成合成胸X射线图像和Covid-19的检测：基于深度学习的方法
7. Median-based generation of synthetic speech durations using a non-parametric approach [O] . Ronanki, S., Watts, O., King, Simon, 2017

机译：使用非参数方法生成基于中值的合成语音持续时间

Median-based generation of synthetic speech durations using a non-parametric approach

摘要

著录项

相似文献

相关主题

期刊订阅