首页> 外文会议> >Look who's talking: speaker detection using video and audio correlation
【24h】

Look who's talking: speaker detection using video and audio correlation

机译:看谁在说话:使用视频和音频相关性进行说话人检测

获取原文

摘要

The visual motion of the mouth and the corresponding audio data generated when a person speaks are highly correlated. This fact has been exploited for lip/speech-reading and for improving speech recognition. We describe a method of automatically detecting a talking person (both spatially and temporally) using video and audio data from a single microphone. The audio-visual correlation is learned using a time-delayed neural network, which is then used to perform a spatio-temporal search for a speaking person. Applications include videoconferencing, video indexing and improving human-computer interaction (HCI). An example HCI application is provided.
机译:嘴巴的视觉运动与一个人讲话时生成的相应音频数据高度相关。这个事实已被用于嘴唇/语音阅读和改善语音识别。我们描述了一种使用来自单个麦克风的视频和音频数据自动(在空间和时间上)检测讲话者的方法。使用时延神经网络学习视听相关性,然后将其用于执行讲话者的时空搜索。应用程序包括视频会议,视频索引和改善人机交互(HCI)。提供了一个示例HCI应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号