首页> 外文会议>Asian Conference on Affective Computing and Intelligent Interaction >A Gesture-to-Emotional Speech Conversion by Combining Gesture Recognition and Facial Expression Recognition
【24h】

A Gesture-to-Emotional Speech Conversion by Combining Gesture Recognition and Facial Expression Recognition

机译:通过组合手势识别和面部表情识别来举办姿态与情绪语音转换

获取原文
获取外文期刊封面目录资料

摘要

This paper proposes a facial expression integrated sign language to emotional speech conversion method to solve the communication problems between healthy people and speech disorders. Firstly, the characteristics of sign language and the features of facial expression are obtained by a deep neural network (DNN) model. Secondly, a support vector machine (SVM) are trained to classify the sign language and facial expression for recognizing the text of sign language and emotional tags of facial expression. At the same time, a hidden Markov model-based Mandarin-Tibetan bilingual emotional speech synthesizer is trained by speaker adaptive training with a Mandarin emotional speech corpus. Finally, the Mandarin or Tibetan emotional speech is synthesized from the recognized text of sign language and emotional tags. The objective tests show that the recognition rate for static sign language is 90.7%. The recognition rate of facial expression achieves 94.6% on the extended CohnKanade database (CK+) and 80.3% on the JAFFE database respectively. Subjective evaluation demonstrates that synthesized emotional speech can get 4.0 of the emotional mean opinion score. The pleasure-arousal-dominance (PAD) tree dimensional emotion model is employed to evaluate the PAD values for both facial expression and synthesized emotional speech. Results show that the PAD values of facial expression are close to the PAD values of synthesized emotional speech. This means that the synthesized emotional speech can express the emotions of facial expression.
机译:本文提出了一种面部表情综合标志语言,以解决健康人和语音障碍之间的沟通问题。首先,通过深神经网络(DNN)模型获得标牌语言的特征和面部表情的特征。其次,训练支持向量机(SVM)以对识别面部表情的手语和情感标签的文本来分类标志语言和面部表情。与此同时,一个基于马尔可夫模型的普通话 - 藏语双语情绪语音合成器训练由讲话者自适应培训,与普通话情绪语音语料库。最后,普通话或西藏情绪演讲是从签字语言和情感标签的公认文本中合成的。客观测试表明,静态标志语言的识别率为90.7 %。面部表情的识别率分别在jaffe数据库上的扩展Cohnkanade数据库(CK +)和80.3 %上实现了94.6 %。主观评估表明,合成的情绪言论可以获得4.0的情绪平均观点分数。使用令人愉快的唤醒 - 优势(PAD)树维情感模型来评估面部表情和合成情绪言论的焊盘值。结果表明,面部表情的焊盘值接近合成情绪语音的焊盘值。这意味着合成的情绪言论可以表达面部表情的情绪。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号