首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Acoustic-visual synthesis technique using bimodal unit-selection
【24h】

Acoustic-visual synthesis technique using bimodal unit-selection

机译:使用双峰单元选择的声光合成技术

获取原文

摘要

This paper presents a bimodal acoustic-visual synthesis technique that concurrently generates the acoustic speech signal and a 3D animation of the speaker’s outer face. This is done by concatenating bimodal diphone units that consist of both acoustic and visual information. In the visual domain, we mainly focus on the dynamics of the face rather than on rendering. The proposed technique overcomes the problems of asynchrony and incoherence inherent in classic approaches to audiovisual synthesis. The different synthesis steps are similar to typical concatenative speech synthesis but are generalized to the acoustic-visual domain. The bimodal synthesis was evaluated using perceptual and subjective evaluations. The overall outcome of the evaluation indicates that the proposed bimodal acoustic-visual synthesis technique provides intelligible speech in both acoustic and visual channels.
机译:本文提出了一种双峰声学-视觉合成技术,该技术可同时生成声学语音信号和扬声器外表面的3D动画。这是通过串联由声音和视觉信息组成的双峰双音单元来完成的。在视觉领域,我们主要关注面部的动态而不是渲染。所提出的技术克服了经典视听合成方法中固有的异步性和不连贯性的问题。不同的合成步骤与典型的级联语音合成相似,但被推广到视听领域。使用知觉和主观评估来评估双峰合成。评估的总体结果表明,所提出的双峰声学-视觉合成技术可在声学和视觉通道中提供可理解的语音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号