Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

Yamagishi J.; Usabaev B.; King S.; Watts O.; Dines J.; Tian J.; Guan Y.; Hu R.; Oura K.; Wu Y.-J.; Tokuda K.; Karhila R.; Kurimo M.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

【24h】

Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

机译：成千上万种基于HMM的语音合成-基于各种ASR语料库的TTS系统的分析和应用

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In conventional speech synthesis, large amounts of phonetically balanced speech data recorded in highly controlled recording studio environments are typically required to build a voice. Although using such data is a straightforward solution for high quality synthesis, the number of voices available will always be limited, because recording costs are high. On the other hand, our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an “average voice model” plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack phonetic balance. This enables us to consider building high-quality voices on “non-TTS” corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper, we demonstrate the thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal (WSJ0, WSJ1, and WSJCAM0), Resource Management, Globalphone, and SPEECON databases. We also present the results of associated analysis based on perceptual evaluation, and discuss remaining issues.

机译：在常规语音合成中，通常需要在高度控制的录音棚环境中记录的大量语音平衡的语音数据来建立声音。尽管使用此类数据是高质量合成的直接解决方案，但是由于录制成本很高，因此可用语音的数量将始终受到限制。另一方面，我们最近对基于HMM的语音合成系统的实验表明，基于说话者自适应的基于HMM的语音合成（使用“平均语音模型”加模型自适应）对于记录的非理想语音数据具有鲁棒性在各种条件下和使用不同的麦克风时，它们不是很干净，和/或缺乏语音平衡。这使我们可以考虑在“非TTS”语料库（例如ASR语料库）上构建高质量的语音。由于ASR语料库通常包含大量扬声器，因此这导致自动产生大量语音的可能性。在本文中，我们演示了由几种流行的ASR语料库（例如《华尔街日报》（WSJ0，WSJ1和WSJCAM0），资源管理，Globalphone和SPEECON数据库）为基于HMM的语音合成提供的数千种语音。我们还将介绍基于感知评估的相关分析结果，并讨论其余问题。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2010年第5期|p.984-1004|共21页
作者
Yamagishi J.; Usabaev B.; King S.; Watts O.; Dines J.; Tian J.; Guan Y.; Hu R.; Oura K.; Wu Y.-J.; Tokuda K.; Karhila R.; Kurimo M.;
展开▼
作者单位

Centre for Speech Technology Research (CSTR), University of Edinburgh, Edinburgh, United Kingdom;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Automatic speech recognition (ASR); H Triple S (HTS); SPEECON database; WSJ database; average voice; hidden Markov model (HMM)-based speech synthesis; speaker adaptation; speech synthesis; voice conversion;

机译：自动语音识别（ASR）;H Triple S（HTS）;SPEECON数据库;WSJ数据库;平均语音;基于隐马尔可夫模型（HMM）的语音合成;扬声器自适应;语音合成;语音转换;

相似文献

外文文献
中文文献
专利

1. Application of eigenvoice technique to spectrum and pitch pattern modeling in HMM-based speech synthesis [J] . Atsushi Sawabe, Kengo Shichiri, Takayoshi Yoshimura, 電子情報通信学会技術研究報告. 信号処理. Signal Processing . 2001,第323期

机译：特征语音技术在基于HMM的语音合成中的频谱和音高模式建模中的应用
2. Application of eigenvoice technique to spectrum and pitch pattern modeling in HMM-based speech synthesis [J] . Atsushi Sawabe, Kengo Shichiri, Takayoshi Yoshimura, 電子情報通信学会技術研究報告. 音声. Speech . 2001,第325期

机译：特征语音技术在基于HMM的语音合成中的频谱和音高模式建模中的应用
3. Application of eigenvoice technique to spectrum and pitch pattern modeling in HMM-based speech synthesis [J] . Atsushi Sawabe, Kengo Shichiri, Takayoshi Yoshimura, 電子情報通信学会技術研究報告. 信号処理. Signal Processing . 2001,第323期

机译：特征语言在基于HMM的语音合成中的频谱和俯仰模式建模的应用
4. Thousands of Voices for HMM-based Speech Synthesis [C] . Junichi Yamagishi, Bela Usabaev, Simon King, International Speech Communication Association . 2009

机译：基于HMM的语音合成的数千个声音
5. Exploiting high-level knowledge resources for speech recognition with applications to interactive voice response systems [D] . Balakrishna, Mithun 2007

机译：开发用于语音识别的高级知识资源，并将其应用于交互式语音响应系统
6. PRODUCTION OF SOUND BY UNSTEADY THROTTLING OF FLOW INTO A RESONANT CAVITY WITH APPLICATION TO VOICED SPEECH [O] . M. S. Howe, R. S. McGowan -1

机译：生产sOUND非稳定节流流入谐振腔的随着应用程序来浊音
7. Analysis of Unsupervised and Noise-Robust Speaker-Adaptive HMM-Based Speech Synthesis Systems toward a Unified ASR and TTS Framework [O] . Yamagishi Junichi, Lincoln Mike, King Simon, 2009

机译：面向统一ASR和TTS框架的无监督且噪声强的基于说话人自适应HMM的语音合成系统分析

Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅