首页> 外文会议>Irish Signals and Systems Conference >Integrating a Voice Analysis-Synthesis System with a TTS Framework for Controlling Affect and Speaker Identity
【24h】

Integrating a Voice Analysis-Synthesis System with a TTS Framework for Controlling Affect and Speaker Identity

机译:将语音分析合成系统与TTS框架集成,用于控制影响和扬声器标识

获取原文

摘要

This paper reports an experiment exploring how a voice analysis-synthesis system, GlórCáil, can be used to add expressiveness to the synthetic voice in text-to-speech (TTS) systems. This implementation focuses on the Irish ABAIR TTS voices, where such voice control would facilitate many current/envisaged applications. GlórCáil allows voice control of synthesized speech, and for this experiment was integrated into a DNN-based TTS framework. Utterances were generated with f0, voice quality and vocal tract parameter manipulations targeting shifts in speaker identity and in the affective coloring of utterances. Scaling factors used for the manipulations were suggested in an earlier study. They involved global changes without sentence-internal dynamic variation, with a view to ascertain whether such global shifts might alter listeners’ perception of speaker identity and affect. Results demonstrate affect shifts compatible with expectations. However, there were confounding factors. The female/child voices were poorly differentiated, which was expected given the similarity in the scaling factors used. The affect transformations suggest the baseline voice used had an intrinsically sad quality so that there is weak differentiation between the sad and no emotion stimuli. Male angry voice was the least successful, suggesting that dynamic, within-utterance variation is essential for the signaling of certain affects.
机译:本文报告了一个实验,探索了语音分析合成系统,Glórcáil可用于为文本到语音(TTS)系统中的合成声音添加表现力。此实施侧重于爱尔兰人ABAIR TTS声音,其中这种语音控制将有助于许多当前/设想的应用程序。 Glórcáil允许合成语音的语音控制,并且对于该实验集成到基于DNN的TTS框架中。用f产生的话语 0 ,语音质量和声乐道参数操纵瞄准扬声器身份和情感色彩的情感着色。在早期的研究中提出了用于操纵的缩放因子。它们涉及没有句子内部动态变化的全局变化,以确定此类全局变化是否可能会改变听众对扬声器身份的感知和影响。结果展示了与期望兼容的变化。但是,有混杂因素。女性/儿童的声音差异很差,预期是在使用的缩放因子中的相似性。影响转型表明,所使用的基线语音具有本质上悲伤的质量,以便悲伤和无情感刺激之间存在较弱的分化。男性愤怒的声音是最不成功的,这表明动态,在发声情况下,对某些影响的信号传导至关重要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号