首页> 外文会议>International Conference on Circuit, Power and Computing Technologies >LP and TD-PSOLA-based incorporation of happiness in neutral speech using time-domain parameters
【24h】

LP and TD-PSOLA-based incorporation of happiness in neutral speech using time-domain parameters

机译:使用时域参数在中性语音中基于LP和TD-PSOLA的幸福感融合

获取原文
获取外文期刊封面目录资料

摘要

Emotions express a person's internal state of being and it is reflected in the speech utterances. Emotions affect the time-domain characteristics of the speech signal, namely intonation patterns, speech rate, and short-term energy function. Conventional text-to-speech (TTS) systems are built to produce speech utterances for a given text, without any emotion, which can be called as neutral speech. Building a TTS system which can produce speech utterances with expected emotion is not a trivial task, in the sense that for each of the emotions, a separate speech corpus should be carefully collected and the system should be built. Therefore, the current work focuses on incorporating happiness into neutral speech using signal processing algorithms. In this regard, neutral and happy speech are analyzed and it is found that happiness can be perceived in certain emotive words in a sentence. Thus, in order to introduce happiness into neutral speech, these emotive keywords are identified and the above mentioned time-domain parameters are modified. Linear prediction-based synthesis of happy speech is initially performed. To improve the quality of the synthesized speech, TD-PSOLA is then used. Subjective evaluation yields a mean opinion score of 2.05 (out of a maximum of 3) for happy speech synthesized using linear prediction and 2.53 for those synthesized using TD-PSOLA.
机译:情绪表达一个人的内在状态,并在言语表达中得到反映。情绪会影响语音信号的时域特性,即语调模式,语速和短期能量函数。常规的文本语音转换(TTS)系统旨在为给定文本生成语音发声,而不会产生任何情感,这可以称为中性语音。在某种意义上说,构建一个可以产生具有预期情绪的语音发声的TTS系统并不是一件容易的事,因为对于每种情绪,都应该仔细收集一个单独的语音语料,并且应该构建该系统。因此,当前的工作集中在使用信号处理算法将幸福融入中性语音中。在这方面,分析了中性和快乐的言语,发现可以在句子中的某些情感词中感知到快乐。因此,为了将幸福引入中性语音中,识别了这些情感关键词并且修改了上述时域参数。最初执行基于线性预测的快乐语音合成。为了提高合成语音的质量,然后使用TD-PSOLA。对于使用线性预测合成的快乐语音,主观评估得出的平均意见得分为2.05(满分为3),对于使用TD-PSOLA合成的快乐讲话,主观评价得分为2.53。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号