首页> 外文期刊>IEEE transactions on audio, speech and language processing >An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS
【24h】

An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS

机译:客观和主观研究语义和韵律特征在建立情感TTS语料库中的作用

获取原文
获取原文并翻译 | 示例
           

摘要

Building a text corpus suitable to be used in corpus-based speech synthesis is a time-consuming process that usually requires some human intervention to select the desired phonetic content and the necessary variety of prosodic contexts. If an emotional text-to-speech (TTS) system is desired, the complexity of the corpus generation process increases. This paper presents a study aiming to validate or reject the use of a semantically neutral text corpus for the recording of both neutral and emotional (acted) speech. The use of this kind of texts would eliminate the need to include semantically emotional texts into the corpus. The study has been performed for Basque language. It has been made by performing subjective and objective comparisons between the prosodic characteristics of recorded emotional speech using both semantically neutral and emotional texts. At the same time, the performed experiments allow for an evaluation of the capability of prosody to carry emotional information in Basque language. Prosody manipulation is the most common processing tool used in concatenative TTS. Experiments of automatic recognition of the emotions considered in this paper (the "Big Six emotions") show that prosody is an important emotional indicator, but cannot be the only manipulated parameter in an emotional TTS system-at least not for all the emotions. Resynthesis experiments transferring prosody from emotional to neutral speech have also been performed. They corroborate the results and support the use of a neutral-semantic-content text in databases for emotional speech synthesis.
机译:建立适用于基于语料库的语音合成的文本语料库是一个耗时的过程,通常需要一些人为干预才能选择所需的语音内容和必要的韵律情境。如果需要情感文字转语音(TTS)系统,则语料库生成过程的复杂性会增加。本文提出了一项旨在验证或拒绝使用语义中性文本语料库来记录中性和情感(实际)语音的研究。这种文本的使用将消除将语义情感文本包含到语料库中的需要。该研究已针对巴斯克语进行。它是通过使用语义中性和情感文本在所记录的情感语音的韵律特征之间进行主观和客观比较而制成的。同时,进行的实验可以评估韵律以巴斯克语传递情感信息的能力。韵律操纵是串联TTS中最常用的处理工具。本文考虑的自动识别情绪的实验(“六种大情绪”)表明,韵律是重要的情绪指标,但不能成为情绪TTS系统中唯一的操纵参数-至少不是所有情绪都可以。还进行了将韵律从情绪性语音转换为中性语音的再合成实验。他们证实了这一结果,并支持在数据库中使用中性语义内容文本进行情感语音合成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号