首页> 外文会议>IEEE Workshop on Spoken Language Technology >Median-based generation of synthetic speech durations using a non-parametric approach
【24h】

Median-based generation of synthetic speech durations using a non-parametric approach

机译:使用非参数方法基于中位数的合成语音持续时间生成

获取原文

摘要

This paper proposes a new approach to duration modelling for statistical parametric speech synthesis in which a recurrent statistical model is trained to output a phone transition probability at each timestep (acoustic frame). Unlike conventional approaches to duration modelling - which assume that duration distributions have a particular form (e.g., a Gaussian) and use the mean of that distribution for synthesis - our approach can in principle model any distribution supported on the non-negative integers. Generation from this model can be performed in many ways; here we consider output generation based on the median predicted duration. The median is more typical (more probable) than the conventional mean duration, is robust to training-data irregularities, and enables incremental generation. Furthermore, a frame-level approach to duration prediction is consistent with a longer-term goal of modelling durations and acoustic features together. Results indicate that the proposed method is competitive with baseline approaches in approximating the median duration of held-out natural speech.
机译:本文提出了一种用于统计参数语音合成的持续时间建模的新方法,其中训练了循环统计模型以在每个时间步(声学帧)输出电话转移概率。与传统的持续时间建模方法不同-假设持续时间分布具有特定形式(例如高斯分布)并使用该分布的平均值进行合成-与我们的方法不同,我们的方法原则上可以对非负整数支持的任何分布进行建模。此模型的生成可以通过多种方式执行。在这里,我们考虑基于预测中值持续时间的输出生成。中位数比常规的平均持续时间更典型(更可能),对训练数据不规则具有鲁棒性,并且可以进行增量生成。此外,持续时间预测的帧级方法与将持续时间和声学特征一起建模的长期目标是一致的。结果表明,所提出的方法在估计保留的自然语音的中位数持续时间方面与基线方法相比具有竞争优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号