首页> 外文会议>2012 8th International Symposium on Chinese Spoken Language Processing. >Experiments on unsupervised statistical parametric speech synthesis
【24h】

Experiments on unsupervised statistical parametric speech synthesis

机译:无监督统计参量语音合成实验

获取原文
获取原文并翻译 | 示例

摘要

In order to build web-based voicefonts, an unsupervised method is needed to automate the extraction of acoustic and linguistic properties of speech. This paper addresses the impact of automatic speech transcription on statistical parametric speech synthesis based on a single speaker's 100 hour speech corpus, focusing particularly on two factors of affecting speech quality: transcript accuracy and size of training dataset. Experimental results indicate that for an unsupervised method to achieve fair (MOS 3) voice quality, 1.5 hours of speech are necessary for phone accuracy over 80% and 3.5 hours necessary for phone accuracy down to 65%. Improvement in MOS quality turns out not to be significant when more than 4 hours of speech are used. The usage of automatic transcripts certainly leads to voice degradation. One of the mechanisms behind this is that transcript errors cause mismatches between speech segments and phone labels that significantly distort the structures of decision trees in resultant HMM-based voices.
机译:为了构建基于Web的语音字体,需要一种无监督的方法来自动提取语音的声学和语言属性。本文讨论了基于单个说话者的100小时语音语料库的自动语音转录对统计参数语音合成的影响,尤其着重于影响语音质量的两个因素:转录本准确性和训练数据集的大小。实验结果表明,对于一种获得监督(MOS 3)语音质量的无监督方法,电话精度超过80%时需要1.5个小时的语音,而电话精度低于65%时则需要3.5个小时。当使用超过4个小时的语音时,MOS质量的改善并不明显。自动抄本的使用肯定会导致语音质量下降。其背后的机制之一是,笔录错误会导致语音段与电话标签之间的不匹配,从而严重扭曲最终基于HMM的语音中决策树的结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号