Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement

机译：通过综合分析来检测视频中的扬声器，分析语音和嘴巴运动

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present a robust method to detect and locate a speaker using a joint analysis of speech sound and video image. First, the short speech sound data is analyzed to estimate the rate of spoken syllables, and a difference image is formed using the optimal frame distance derived from the rate to detect the candidates of mouth. Then, they are tracked to positively prove that one of the candidates is the mouth; the rate of mouth movements is estimated from the brightness change profiles for the first candidate and, if both the rates agree, the three brightest parts are detected in the resulting difference image as mouth and eyes. If not, the second candidate is tracked and so on. The first-order moment of the power spectrum of the brightness change profile and the lateral shifts in the tracking are also used to check whether or not they are facial parts.

机译：我们介绍了一种使用语音声音和视频图像的联合分析来检测和定位扬声器的鲁棒方法。首先，分析短语声音数据以估计口头音节的速率，并且使用从速率衍生的最佳帧距离来形成差异图像。然后，他们被跟踪以积极证明其中一个候选人是嘴巴;从第一候选者的亮度变化轮廓估计口动率，并且如果速率都同意，则在由此产生的差异图像中检测到三个最亮部分。如果没有，则跟踪第二个候选者等等。亮度变化轮廓的功率谱的一阶时刻以及跟踪中的横向偏移也用于检查它们是否是面部部件。

著录项

来源
《International Symposium on Advances in Visual Computing》|2007年||共9页
会议地点
作者
Osamu Ikeda;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker's mouth movements and speech [J] . Shahin Antoine J., Shen Stanley, Kerlin Jess R. Language, cognition and neuroscience . 2017,第9期

机译：通过扬声器的嘴巴运动和演讲的光谱仪保力增强了视听的宽容
2. Volubility, Consonant Emergence, and Syllabic Structure in Infants and Toddlers Later Diagnosed With Childhood Apraxia of Speech, Speech Sound Disorder, and Typical Development: A Retrospective Video Analysis [J] . Overby Megan S., Caspari Susan S., Schreiber James Journal of speech, language, and hearing research: JSLHR . 2019,第6期

机译：婴儿和幼儿中的体积，辅音和音节结构，后来被诊断出患有童年的言语，语音声音障碍和典型发展：回顾性视频分析
3. Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement [C] . Osamu Ikeda International Symposium on Advances in Visual Computing(ISVC 2007); 20071126-28; Lake Tahoe,NV(US) . 2007

机译：语音和嘴巴运动的组合分析在视频中说话者的检测
4. The role of "focus of attention" on the learning of non-native speech sounds: English speakers learning of Mandarin Chinese tones. [D] . Almelaifi, Ruba B. 2012

机译：“注意力集中”在学习非母语语音方面的作用：说英语的人学习普通话。
5. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker’s mouth movements and speech [O] . Antoine J Shahin, Stanley Shen, Jess R Kerlin -1

机译：说话者的嘴巴动作和言语的时空保真度提高了视听异步的容忍度
6. Tolerance for audiovisual asynchrony is enhanced by the spectrotemporal fidelity of the speaker’s mouth movements and speech [O] . Antoine J. Shahin, Stanley Shen, Jess R. Kerlin 2017

机译：通过扬声器的嘴巴运动和演讲的光谱仪保力增强了视听的宽度的公差
7. (abstract) Synthesis of Speaker Facial Movements to Match Selected Speech Sequences [R] . Scott, Kenneth C. 1994

机译：（摘要）用于匹配所选语音序列的说话人面部动作的合成

Detection of a Speaker in Video by Combined Analysis of Speech Sound and Mouth Movement

摘要

著录项

相似文献

相关主题

期刊订阅