Emotional speech resynthesis.

机译：情感语音再合成。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Emotions play an important role in human life. They are essential for communication, for decision making, and for survival. They pose a challenging research area across diverse disciplines such as psychology, sociology, philosophy, medicine and engineering. One realm of inquiry relates to emotions expressed in speech. In this study our focus is on angry, happy, sad, and neutral emotions in speech. We investigate the speech acoustic correlates that are important for emotion perception in utterances and propose techniques to synthesize emotional speech which will be correctly recognized by human listeners. The motivation for our research comes from the desire to impart emotion processing capabilities to machines in order to make human-machine interactions more pleasant, effective and productive. Instead of generating the emotional speech from text, in our approach we start with a natural neutral utterance and modify its acoustic features to impart the targeted emotion. As shown by the analysis and recognition studies, spectral and prosodic (F0, duration, energy) parameters can be successfully used to describe and recognize emotions. In this study we utilize these acoustic parameters for emotion resynthesis and follow an experimental methodology to investigate how they should be modified in order to produce one of the angry, happy or sad emotions in human speech. Based on the experiment results a multi-level emotion to emotion transformation (ETET) system is proposed. This is a novel system which is capable of generating good quality emotional speech. It consists of three main components that modify speech acoustic parameters at different time scales. First spectral conversion is applied at phoneme level, then prosody parameters are statistically estimated and modified at part of speech (POS) tags level, and finally automatically selected modification factors are applied on voiced and unvoiced regions. The proposed ETET system is robust and it can be easily adapted to new emotions and speakers. The field of emotional speech synthesis is a challenging new research area. We believe that the ideas, results, and discussions presented in this study will be beneficial for improving the rapidly developing and growing research of emotions in speech.

机译：情感在人类生活中起着重要作用。它们对于沟通，决策和生存至关重要。他们构成了心理学，社会学，哲学，医学和工程学等不同学科的具有挑战性的研究领域。探究的一个领域与言语表达的情感有关。在这项研究中，我们的重点是语音中的愤怒，快乐，悲伤和中性情绪。我们研究语音发声相关性，这些相关性对于发声中的情感感知非常重要，并提出了合成情感语音的技术，该技术将被人类听众正确识别。我们进行研究的动机来自向机器赋予情感处理能力以使人机交互更加愉快，有效和富有成效的愿望。在我们的方法中，不是从文本中产生情感语音，而是从自然的中性说话开始，并修改其声学特征以传递目标情感。如分析和识别研究所示，频谱和韵律（F0，持续时间，能量）参数可以成功地用于描述和识别情绪。在这项研究中，我们利用这些声学参数进行情绪合成，并遵循一种实验方法来研究如何对其进行修改，以在人类语音中产生一种愤怒，快乐或悲伤的情绪。基于实验结果，提出了一种多层次的情感到情感转换系统。这是一个新颖的系统，能够产生高质量的情感语音。它由三个主要组件组成，这些组件可以在不同的时标上修改语音声学参数。首先在音素级别应用频谱转换，然后在语音部分（POS）标签级别对韵律参数进行统计估计和修改，最后将自动选择的修改因子应用于有声和无声区域。拟议的ETET系统功能强大，可以轻松适应新的情绪和说话者。情感语音合成领域是一个充满挑战的新研究领域。我们认为，本研究中提出的想法，结果和讨论将有助于改进语音情感的快速发展和增长。

著录项

作者
Bulut, Murtaza.;
展开▼
作者单位

University of Southern California.;

展开▼
授予单位 University of Southern California.;
学科 Engineering Electronics and Electrical.;Computer Science.
学位 Ph.D.
年度 2008
页码 269 p.
总页数 269
原文格式 PDF
正文语种 eng
中图分类无线电电子学、电信技术 ; 自动化技术、计算机技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. Language identification with suprasegmental cues: a study based on speech resynthesis. [J] . Ramus F, Mehler J The Journal of the Acoustical Society of America . 1999 ,第1期

机译：具有超节段提示的语言识别：基于语音再合成的研究。
2. SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla [J] . Sadia Sultana, M. Shahidur Rahman, M. Reza Selim, PLoS One . 2021 ,第4期

机译：Sust Bangla情感语音语料库（Subesco）：孟加拉的一个音频情绪语音语料库
3. Categorical and Dimensional Ratings of Emotional Speech: Behavioral Findings From the Morgan Emotional Speech Set [J] . Shae D. Morgan Journal of speech, language, and hearing research: JSLHR . 2019 ,第11期

机译：情绪言论的分类和尺寸评级：摩根情绪讲话集的行为发现
4. TOWARD RELAYING EMOTIONAL STATE FOR SPEECH-TO-SPEECH TRANSLATOR: ESTIMATION OF EMOTIONAL STATE FOR SYNTHESIZING SPEECH WITH EMOTION [C] . Masato Akagi, Reda Elbarougy International Congress on Sound and Vibration . 2014

机译：朝着语音转换讲述的中继情绪状态：估计情绪综合演讲的情绪状态
5. Emotional speech: A quantitative study of vocal acoustics in emotional expression. [D] . Katz, Gary Scott. 1998

机译：情感言语：对情感表达中的声音声学的定量研究。
6. SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla [O] . Sadia Sultana, M. Shahidur Rahman, M. Reza Selim, 2021

机译：Sull Bangla情感语音语料库（Subesco）：孟加拉的一个音频情绪语音语音
7. SUST Bangla Emotional Speech Corpus (SUBESCO): An audio-only emotional speech corpus for Bangla [O] . Sadia Sultana, M. Shahidur Rahman, M. Reza Selim, 2021

机译：Sull Bangla情感语音语料库（Subesco）：孟加拉的一个音频情绪语音语音

Emotional speech resynthesis.

摘要

著录项

相似文献

相关主题

期刊订阅