首页> 外文期刊>Computer speech and language >Animated Lombard speech: Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions
【24h】

Animated Lombard speech: Motion capture, facial animation and visual intelligibility of speech produced in adverse conditions

机译:伦巴第动画语音:在不利条件下产生的动作捕捉,面部动画和语音的视觉清晰度

获取原文
获取原文并翻译 | 示例

摘要

In this paper we study the production and perception of speech in diverse conditions for the purposes of accurate, flexible and highly intelligible talking face animation. We recorded audio, video and facial motion capture data of a talker uttering a set of 180 short sentences, under three conditions: normal speech (in quiet), Lombard speech (in noise), and whispering. We then produced an animated 3D avatar with similar shape and appearance as the original talker and used an error minimization procedure to drive the animated version of the talker in a way that matched the original performance as closely as possible. In a perceptual intelligibility study with degraded audio we then compared the animated talker against the real talker and the audio alone, in terms of audio-visual word recognition rate across the three different production conditions. We found that the visual intelligibility of the animated talker was on par with the real talker for the Lombard and whisper conditions. In addition we created two incongruent conditions where normal speech audio was paired with animated Lombard speech or whispering. When compared to the congruent normal speech condition, Lombard animation yields a significant increase in intelligibility, despite the AV-incongruence. In a separate evaluation, we gathered subjective opinions on the different animations, and found that some degree of incongruence was generally accepted.
机译:在本文中,我们研究各种条件下语音的产生和感知,以达到准确,灵活和易于理解的说话人脸动画的目的。我们记录了讲话者的音频,视频和面部动作捕获数据,这些讲话者在以下三种情况下发出了180个简短的句子:正常语音(安静),伦巴第语音(嘈杂)和耳语。然后,我们制作了一个动画3D化身,其形状和外观与原始讲话者相似,并使用错误最小化过程来驱动讲话者的动画版本,使其方式与原始表演尽可能接近。在对音频质量下降的可感知性研究中,我们将动画讲话者与真实讲话者和单独的音频进行了比较,比较了三种不同生产条件下的视听单词识别率。我们发现动画朗读者的视觉清晰度与伦巴第和耳语条件下的真实朗读者相当。此外,我们创建了两个不一致的条件,即普通语音音频与动画的伦巴第语音或耳语配对。与同等的正常语音条件相比,尽管AV不一致,但Lombard动画的清晰度明显提高。在单独的评估中,我们收集了关于不同动画的主观意见,并发现某种程度的不一致通常被接受。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号