首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora
【24h】

Thousands of Voices for HMM-Based Speech Synthesis–Analysis and Application of TTS Systems Built on Various ASR Corpora

机译:成千上万种基于HMM的语音合成-基于各种ASR语料库的TTS系统的分析和应用

获取原文
获取原文并翻译 | 示例
       

摘要

In conventional speech synthesis, large amounts of phonetically balanced speech data recorded in highly controlled recording studio environments are typically required to build a voice. Although using such data is a straightforward solution for high quality synthesis, the number of voices available will always be limited, because recording costs are high. On the other hand, our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an “average voice model” plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack phonetic balance. This enables us to consider building high-quality voices on “non-TTS” corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper, we demonstrate the thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal (WSJ0, WSJ1, and WSJCAM0), Resource Management, Globalphone, and SPEECON databases. We also present the results of associated analysis based on perceptual evaluation, and discuss remaining issues.
机译:在常规语音合成中,通常需要在高度控制的录音棚环境中记录的大量语音平衡的语音数据来建立声音。尽管使用此类数据是高质量合成的直接解决方案,但是由于录制成本很高,因此可用语音的数量将始终受到限制。另一方面,我们最近对基于HMM的语音合成系统的实验表明,基于说话者自适应的基于HMM的语音合成(使用“平均语音模型”加模型自适应)对于记录的非理想语音数据具有鲁棒性在各种条件下和使用不同的麦克风时,它们不是很干净,和/或缺乏语音平衡。这使我们可以考虑在“非TTS”语料库(例如ASR语料库)上构建高质量的语音。由于ASR语料库通常包含大量扬声器,因此这导致自动产生大量语音的可能性。在本文中,我们演示了由几种流行的ASR语料库(例如《华尔街日报》(WSJ0,WSJ1和WSJCAM0),资源管理,Globalphone和SPEECON数据库)为基于HMM的语音合成提供的数千种语音。我们还将介绍基于感知评估的相关分析结果,并讨论其余问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号