首页> 外文会议>International Symposium on Advances in Visual Computing >Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement
【24h】

Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement

机译:通过综合分析来检测视频中的扬声器,分析语音和嘴巴运动

获取原文

摘要

We present a robust method to detect and locate a speaker using a joint analysis of speech sound and video image. First, the short speech sound data is analyzed to estimate the rate of spoken syllables, and a difference image is formed using the optimal frame distance derived from the rate to detect the candidates of mouth. Then, they are tracked to positively prove that one of the candidates is the mouth; the rate of mouth movements is estimated from the brightness change profiles for the first candidate and, if both the rates agree, the three brightest parts are detected in the resulting difference image as mouth and eyes. If not, the second candidate is tracked and so on. The first-order moment of the power spectrum of the brightness change profile and the lateral shifts in the tracking are also used to check whether or not they are facial parts.
机译:我们介绍了一种使用语音声音和视频图像的联合分析来检测和定位扬声器的鲁棒方法。首先,分析短语声音数据以估计口头音节的速率,并且使用从速率衍生的最佳帧距离来形成差异图像。然后,他们被跟踪以积极证明其中一个候选人是嘴巴;从第一候选者的亮度变化轮廓估计口动率,并且如果速率都同意,则在由此产生的差异图像中检测到三个最亮部分。如果没有,则跟踪第二个候选者等等。亮度变化轮廓的功率谱的一阶时刻以及跟踪中的横向偏移也用于检查它们是否是面部部件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号