Experiments on unsupervised statistical parametric speech synthesis

机译：无监督统计参量语音合成实验

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

In order to build web-based voicefonts, an unsupervised method is needed to automate the extraction of acoustic and linguistic properties of speech. This paper addresses the impact of automatic speech transcription on statistical parametric speech synthesis based on a single speaker's 100 hour speech corpus, focusing particularly on two factors of affecting speech quality: transcript accuracy and size of training dataset. Experimental results indicate that for an unsupervised method to achieve fair (MOS 3) voice quality, 1.5 hours of speech are necessary for phone accuracy over 80% and 3.5 hours necessary for phone accuracy down to 65%. Improvement in MOS quality turns out not to be significant when more than 4 hours of speech are used. The usage of automatic transcripts certainly leads to voice degradation. One of the mechanisms behind this is that transcript errors cause mismatches between speech segments and phone labels that significantly distort the structures of decision trees in resultant HMM-based voices.

机译：为了构建基于Web的语音字体，需要一种无监督的方法来自动提取语音的声学和语言属性。本文讨论了基于单个说话者的100小时语音语料库的自动语音转录对统计参数语音合成的影响，尤其着重于影响语音质量的两个因素：转录本准确性和训练数据集的大小。实验结果表明，对于一种获得监督（MOS 3）语音质量的无监督方法，电话精度超过80％时需要1.5个小时的语音，而电话精度低于65％时则需要3.5个小时。当使用超过4个小时的语音时，MOS质量的改善并不明显。自动抄本的使用肯定会导致语音质量下降。其背后的机制之一是，笔录错误会导致语音段与电话标签之间的不匹配，从而严重扭曲最终基于HMM的语音中决策树的结构。

著录项

来源
《2012 8th International Symposium on Chinese Spoken Language Processing.》|2012年|p.155-159|共5页
会议地点 Hong Kong(HK);Hong Kong(HK)
作者
Ni Jinfu; Shiga Yoshinori; Kawai Hisashi; Kashioka Hideki;
展开▼
作者单位

Spoken Language Communication Laboratory, Universal Communication Research Institute, National Institute of Information and Communications Technology, Kyoto, Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;人工智能理论;
关键词
HMM-based speech synthesis; Voice degradation; automatic speech transcription; unsupervised method; web-based voicefonts;

机译：基于HMM的语音合成;语音降级;自动语音转录;无监督方法;基于Web的语音字体;;
入库时间 2022-08-26 14:05:18

相似文献

外文文献
中文文献
专利

1. Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis [J] . Marvin Coto-Jiménez Biomimetics . 2021,第12期

机译：基于深度学习的判别多流破旧，用于增强统计参数致辞综合
2. Duration modelling and evaluation for Arabic statistical parametric speech synthesis [J] . Zangar Imene, Mnasri Zied, Colotte Vincent, Multimedia Tools and Applications . 2021,第6期

机译：阿拉伯语统计参数致辞合成的持续时间建模与评估
3. Excitation modelling using epoch features for statistical parametric speech synthesis [J] . M Kiran Reddy, K Sreenivasa Rao Computer speech and language . 2020,第Mara期

机译：使用纪元特征进行激励建模以进行统计参数语音合成
4. Experiments on unsupervised statistical parametric speech synthesis [C] . Ni Jinfu, Shiga Yoshinori, Kawai Hisashi, International Symposium on Chinese Spoken Language Processing . 2012

机译：无监督统计参数致辞合成的实验
5. Statistical Parametric Speech Synthesis using Deep Learning Architectures [D] . Kang, Shiyin. 2016

机译：使用深度学习架构的统计参数致辞
6. Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis [O] . Marvin Coto-Jiménez 2021

机译：基于深度学习的判别多流破旧用于增强统计参数致辞综合
7. DIRECTLY MODELING SPEECH WAVEFORMS BY NEURAL NETWORKS FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS [O] . Keiichi Tokudayz, Heiga Zeny 2015

机译：用神经网络直接模拟语音波形进行统计参数语音合成

Experiments on unsupervised statistical parametric speech synthesis

摘要

著录项

相似文献

相关主题

期刊订阅