Segmentation of the Speaker's Face Region with Audiovisual Correlation

Yuyu LIU; Yoichi SATO

首页> 外文期刊>IEICE Transactions on Information and Systems >Segmentation of the Speaker's Face Region with Audiovisual Correlation

【24h】

Segmentation of the Speaker's Face Region with Audiovisual Correlation

机译：视听相关性对说话人面部区域的分割

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ability to find the speaker's face region in a video is useful for various applications. In this work, we develop a novel technique to find this region within different time windows, which is robust against the changes of view, scale, and background. The main thrust of our technique is to integrate audiovisual correlation analysis into a video segmentation framework. We analyze the audiovisual correlation locally by computing quadratic mutual information between our audiovisual features. The computation of quadratic mutual information is based on the probability density functions estimated by kernel density estimation with adaptive kernel bandwidth. The results of this audiovisual correlation analysis are incorporated into graph cut-based video segmentation to resolve a globally optimum extraction of the speaker's face region. The setting of any heuristic threshold in this segmentation is avoided by learning the correlation distributions of speaker and background by expectation maximization. Experimental results demonstrate that our method can detect the speaker's face region accurately and robustly for different views, scales, and backgrounds.

机译：在视频中找到讲话者面部区域的能力对于各种应用很有用。在这项工作中，我们开发了一种新颖的技术来在不同的时间窗口内找到该区域，这对于视图，比例和背景的变化具有鲁棒性。我们技术的主要目的是将视听相关分析集成到视频分割框架中。我们通过计算视听特征之间的二次互信息来本地分析视听相关性。二次互信息的计算基于具有自适应内核带宽的内核密度估计所估计的概率密度函数。该视听相关性分析的结果被合并到基于图形剪切的视频分割中，以解决说话者面部区域的全局最优提取。通过期望最大化学习说话人和背景的相关分布，可以避免在这种分割中设置任何启发式阈值。实验结果表明，我们的方法可以针对不同的视图，比例和背景准确，稳健地检测说话者的面部区域。

著录项

来源
《IEICE Transactions on Information and Systems》 |2010年第7期|P.1965-1975|共11页
作者
Yuyu LIU; Yoichi SATO;
展开▼
作者单位

Institute of Industrial Science, The Uni- versity of Tokyo, Tokyo, 153-8505 Japan;

rnInstitute of Industrial Science, The Uni- versity of Tokyo, Tokyo, 153-8505 Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
speaker detection; audiovisual analysis; segmentation; graph cut;

机译：说话人检测;视听分析;分割;图切;
入库时间 2022-08-18 00:27:01

相似文献

外文文献
中文文献
专利

1. Segmentation of the Speaker's Face Region with Audiovisual Correlation [J] . Yuyu LIU, Yoichi SATO IEICE transactions on information and systems . 2010,第7期

机译：视听相关性对说话人面部区域的分割
2. Audiovisual Speaker Identification Based on Lip and Speech Modalities [J] . Chelali Fatma, Djeradi Amar The international arab journal of information technology . 2017,第1期

机译：基于嘴唇和语音模态的视听说话人识别
3. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech [J] . Shahin Antoine J., Shen Stanley, Kerlin Jess R. Language, cognition and neuroscience . 2017,第9期

机译：通过扬声器的嘴巴运动和演讲的光谱仪保力增强了视听的宽容
4. Maximising Audiovisual Correlation with Automatic Lip Tracking and Vowel Based Segmentation [C] . Andrew Abel, Amir Hussain, Quoc-Dinh Nguyen, Biometric ID management and multimodal communication . 2009

机译：通过自动的唇音跟踪和基于元音的分割，最大程度地提高视听相关性
5. Probabilistic correspondence mapping for audiovisual speaker modeling [D] . Liu, Ming 2007

机译：视听说话人建模的概率对应映射
6. Audiovisual perceptual learning with multiple speakers [O] . Aaron D. Mitchel, Chip Gerfen, Daniel J. Weiss -1

机译：多个说话人的视听感知学习
7. Maximising audiovisual correlation with automatic lip tracking and vowel based segmentation [O] . Andrew Abel, Amir Hussain, Quoc-dinh Nguyen, 2015

机译：通过自动唇部跟踪和基于元音的分割最大化视听相关性
8. Approche Stochastique de la Segmentation des Images: Un Modele de Coopertion Entre les Primitives de Regions et de Frontieres (Stochastic Approach of Image Segmentation: A Model of Cooperation Between Region Primitives and Boundar [R] . Bouakaz, S. 1987

机译：approche stochastique de la segmentation des Images：Un modele de Coopertion Entre les primitives de Regions et de Frontieres（图像分割的随机方法：区域基元与边界之间的合作模型）

Segmentation of the Speaker's Face Region with Audiovisual Correlation

摘要

著录项

相似文献

相关主题

期刊订阅