首页> 外文期刊>IEICE Transactions on Information and Systems >Segmentation of the Speaker's Face Region with Audiovisual Correlation
【24h】

Segmentation of the Speaker's Face Region with Audiovisual Correlation

机译:视听相关性对说话人面部区域的分割

获取原文
获取原文并翻译 | 示例
       

摘要

The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.
机译:在视频中找到讲话者面部区域的能力对于各种应用很有用。在这项工作中,我们开发了一种新颖的技术来在不同的时间窗口内找到该区域,这对于视图,比例和背景的变化具有鲁棒性。我们技术的主要目的是将视听相关分析集成到视频分割框架中。我们通过计算视听特征之间的二次互信息来本地分析视听相关性。二次互信息的计算基于具有自适应内核带宽的内核密度估计所估计的概率密度函数。该视听相关性分析的结果被合并到基于图形剪切的视频分割中,以解决说话者面部区域的全局最优提取。通过期望最大化学习说话人和背景的相关分布,可以避免在这种分割中设置任何启发式阈值。实验结果表明,我们的方法可以针对不同的视图,比例和背景准确,稳健地检测说话者的面部区域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号