首页> 外文会议> >Look who's talking: speaker detection using video and audio correlation

【24h】

Look who's talking: speaker detection using video and audio correlation

机译：看谁在说话：使用视频和音频相关性进行说话人检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The visual motion of the mouth and the corresponding audio data generated when a person speaks are highly correlated. This fact has been exploited for lip/speech-reading and for improving speech recognition. We describe a method of automatically detecting a talking person (both spatially and temporally) using video and audio data from a single microphone. The audio-visual correlation is learned using a time-delayed neural network, which is then used to perform a spatio-temporal search for a speaking person. Applications include videoconferencing, video indexing and improving human-computer interaction (HCI). An example HCI application is provided.

机译：嘴巴的视觉运动与一个人讲话时生成的相应音频数据高度相关。这个事实已被用于嘴唇/语音阅读和改善语音识别。我们描述了一种使用来自单个麦克风的视频和音频数据自动（在空间和时间上）检测讲话者的方法。使用时延神经网络学习视听相关性，然后将其用于执行讲话者的时空搜索。应用程序包括视频会议，视频索引和改善人机交互（HCI）。提供了一个示例HCI应用程序。

著录项

来源
《》|2000年|P.1589-1592|共4页
会议地点
作者
Cutler; R.; Davis; L.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类无线电电子学、电信技术;
关键词

相似文献

外文文献
中文文献
专利

1. Robust Joint Audio-Video Talker Localization in Video Conferencing Using Reliability Information--II: Bayesian Network Fusion [J] . David Lo, Rafik A. Goubran, Richard M. Dansereau IEEE Transactions on Instrumentation and Measurement . 2005,第4期

机译：使用可靠性信息的视频会议中的鲁棒联合音频视频讲话者本地化--II：贝叶斯网络融合
2. Robust indoor speaker recognition in a network of audio and video sensors [J] . Eleonora DArca, Neil M. Robertson, James R. Hopgood Signal processing . 2016,第deca期

机译：音频和视频传感器网络中的可靠室内说话人识别
3. On the Applicability of Speaker Diarization to Audio Indexing of Non-Speech and Mixed Non-Speech/ Speech Video Soundtracks [J] . Robert Mertens, Po-Sen Huang, Luke Gottlieb, International journal of multimedia data engineering & management . 2012,第3期

机译：说话者差异化在非语音和非语音/语音混合视频音轨的音频索引中的适用性
4. Look who's talking: speaker detection using video and audio correlation [C] . Ross Cutler, Larry Davis IEEE International Conference on Multimedia and Expo . 2000

机译：看谁在说话：使用视频和音频相关的扬声器检测
5. Multimodal Sensing and Data Processing for Speaker and Emotion Recognition Using Deep Learning Models with Audio, Video and Biomedical Sensors [D] . Abtahi, Farnaz. 2018

机译：使用具有音频，视频和生物医学传感器的深度学习模型，对说话人和情感识别进行多模式传感和数据处理
6. Look who’s talking! Gaze patterns for implicit and explicit audio-visual speech synchrony detection in children with high-functioning autism [O] . Ruth B. Grossman, Erin Steinhart, Teresa Mitchell, -1

机译：看谁正在说话！高自闭症儿童的隐式和显式视听语音同步检测的注视模式
7. Dual-modality Talking-metrics: 3D Visual-Audio Integrated Behaviometric Cues from Speakers [O] . Jie Zhang, Korin Richmond, Robert B. Fisher 2018

机译：双级谈话指标：来自扬声器的3D视觉音频集成行为尺度线索
8. Multiple Target Detection in Video Using Quadratic Multi-Frame Correlation Filtering. [R] . Kerekes, R. A., Kumar, B. V. K. V. 2008

机译：基于二次多帧相关滤波的视频多目标检测。

Look who's talking: speaker detection using video and audio correlation

摘要

著录项

相似文献

相关主题

期刊订阅