首页> 外文会议>International Conference on Statistical Language and Speech Processing >Investigating the Relation Between Voice Corpus Design and Hybrid Synthesis Under Reduction Constraint
【24h】

Investigating the Relation Between Voice Corpus Design and Hybrid Synthesis Under Reduction Constraint

机译:还原约束下语音语料库设计与混合综合的关系研究

获取原文

摘要

Hybrid TTS systems generally try to optimise their cost function with the voice provided to generate the best signal. The voice is based on a speech corpus usually designed for a specific purpose. In this paper, we consider that the voice creation is realized through a corpus design step under reduction constraints. During this stage, a recording script is crafted to be optimal for the target TTS engine and its purpose. In this paper, we investigate the impact of sharing information between the corpus design step and the hybrid TTS optimisation step. We start from a reduced voice optimized for a unit selection system using a CNN-based model. This baseline is compared to a hybrid TTS system that uses, as its target cost, a linguistic embedding built for the recording script design step. This approach is also compared to a standard hybrid TTS system trained only on the voice and so that does not have information about the corpus design process. Objective measures and perceptual evaluations show how the integration of the corpus design embedding as target cost outperforms a classical hard-coded target cost. However, the feed-forward DNN acoustic model from the standard hybrid TTS system remains the best. This emphasizes the importance of acoustic information in the TTS target cost, which is not directly available before the voice recording.
机译:混合TTS系统通常会尝试通过提供语音以生成最佳信号来优化其成本函数。语音基于通常为特定目的而设计的语音语料库。在本文中,我们认为语音创建是通过在缩减约束下的语料库设计步骤实现的。在此阶段,将录制脚本设计为对于目标TTS引擎及其用途而言是最佳的。在本文中,我们研究了语料库设计步骤和混合TTS优化步骤之间共享信息的影响。我们从使用基于CNN的模型为单位选择系统优化的简化语音开始。将该基准与混合TTS系统进行比较,该混合TTS系统使用为录制脚本设计步骤构建的语言嵌入作为目标成本。还将该方法与仅在语音上训练的标准混合TTS系统进行了比较,因此没有关于语料库设计过程的信息。客观的测量和感知评估表明,嵌入作为目标成本的语料库设计的集成如何胜过经典的硬编码目标成本。但是,标准混合TTS系统的前馈DNN声学模型仍然是最好的。这强调了声学信息在TTS目标成本中的重要性,而在语音录制之前这是无法直接获得的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号