首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques
【24h】

Evaluation of Expressive Speech Synthesis With Voice Conversion and Copy Resynthesis Techniques

机译:语音转换和复制再合成技术对表达性语音合成的评估

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Generating expressive synthetic voices requires carefully designed databases that contain sufficient amount of expressive speech material. This paper investigates voice conversion and modification techniques to reduce database collection and processing efforts while maintaining acceptable quality and naturalness. In a factorial design, we study the relative contributions of voice quality and prosody as well as the amount of distortions introduced by the respective signal manipulation steps. The unit selection engine in our open source and modular text-to-speech (TTS) framework MARY is extended with voice quality transformation using either GMM-based prediction or vocal tract copy resynthesis. These algorithms are then cross-combined with various prosody copy resynthesis methods. The overall expressive speech generation process functions as a postprocessing step on TTS outputs to transform neutral synthetic speech into aggressive, cheerful, or depressed speech. Cross-combinations of voice quality and prosody transformation algorithms are compared in listening tests for perceived expressive style and quality. The results show that there is a tradeoff between identification and naturalness. Combined modeling of both voice quality and prosody leads to the best identification scores at the expense of lowest naturalness ratings. The fine detail of both voice quality and prosody, as preserved by the copy synthesis, did contribute to a better identification as compared to the approximate models.
机译:生成表达性合成语音需要精心设计的数据库,其中包含足够数量的表达性语音材料。本文研究了语音转换和修改技术,以减少数据库的收集和处理工作,同时保持可接受的质量和自然性。在析因设计中,我们研究了语音质量和韵律的相对贡献以及各个信号操作步骤引入的失真量。我们的开源和模块化文本语音转换(TTS)框架MARY中的单元选择引擎通过基于GMM的预测或声道复制重新合成的语音质量转换得到扩展。然后将这些算法与各种韵律复制重新合成方法交叉组合。整体表达语音生成过程在TTS输出上充当后处理步骤,以将中性合成语音转换为激进,开朗或沮丧的语音。在收听测试中比较语音质量和韵律转换算法的交叉组合,以感知感知的表达风格和质量。结果表明,在识别和自然之间要权衡。语音质量和韵律的组合建模可以得到最佳的识别分数,但会降低最低的自然等级。与近似模型相比,通过复制合成保留的语音质量和韵律的精细细节确实有助于更好地识别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号