首页> 外文期刊>Computer speech and language >Two-stage phone duration modelling with feature construction and feature vector extension for the needs of speech synthesis
【24h】

Two-stage phone duration modelling with feature construction and feature vector extension for the needs of speech synthesis

机译:两阶段电话持续时间建模,具有特征构建和特征向量扩展功能,可满足语音合成的需求

获取原文
获取原文并翻译 | 示例

摘要

We propose a two-stage phone duration modelling scheme, which can be applied for the improvement of prosody modelling in speech synthesis systems. This scheme builds on a number of independent feature constructors (FCs) employed in the first stage, and a phone duration model (PDM) which operates on an extended feature vector in the second stage. The feature vector, which acts as input to the first stage, consists of numerical and non-numerical linguistic features extracted from text. The extended feature vector is obtained by appending the phone duration predictions estimated by the FCs to the initial feature vector. Experiments on the American-English KED TIMIT and on the Modern Greek WCL-1 databases validated the advantage of the proposed two-stage scheme, improving prediction accuracy over the best individual predictor, and over a two-stage scheme which just fuses the first-stage outputs. Specifically, when compared to the best individual predictor, a relative reduction in the mean absolute error and the root mean square error of 3.9% and 3.9% on the KED TIMIT, and of 4.8% and 4.6% on the WCL-1 database, respectively, is observed.
机译:我们提出了一个两阶段的电话持续时间建模方案,该方案可用于语音合成系统中韵律建模的改进。该方案建立在第一阶段中使用的许多独立特征构造器(FC)和第二阶段中基于扩展特征向量运行的电话时长模型(PDM)的基础上。作为第一阶段输入的特征向量由从文本中提取的数字和非数字语言特征组成。通过将FC估计的电话持续时间预测附加到初始特征向量中,可以获得扩展的特征向量。在美式英语KED TIMIT和现代希腊语WCL-1数据库上进行的实验验证了拟议的两阶段方案的优势,与最佳的单个预测变量相比,在两阶段方案上的预测准确性有所提高,后者仅将第一个阶段的预测因子与阶段输出。具体而言,与最佳的单个预测变量相比,KED TIMIT的平均绝对误差和均方根误差的相对降低分别为3.9%和3.9%,而WCL-1数据库的相对均方根误差分别为4.8%和4.6%观察到。

著录项

  • 来源
    《Computer speech and language》 |2012年第4期|p.274-292|共19页
  • 作者单位

    Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;

    Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;

    Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;

    Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;

    Wire Communications Laboratory, Department of Electrical and Computer Engineering, University of Patras, 26500 Rion-Patras, Greece;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    feature construction; phone duration modelling; statistical modelling; text-to-speech synthesis;

    机译:特征构造;电话持续时间建模;统计建模;文本到语音合成;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号