Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario

机译：人机交互场景中视听评估和单词显着性检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper investigates the audio-visual correlates and the detection of word prominence. Subjects were interacting with a computer in a small game which created a broad and a narrow focus condition. Audio-visual recordings with a distant microphone and without visual markers were made. As acoustic features duration, intensity, fundamental frequency and spectral emphasis were calculated. From the visual channel head movements and image transformation based features from the mouth region were extracted. First the results show that the extracted features are significantly different for the two focus conditions (broad and narrow). Based on classification results it is demonstrated that they can be differentiated without knowledge of the word identity with accuracies of approx. 80%. Furthermore, it is shown that the visual channel by itself yields accuracies notably better than chance (approx. 65%) and that a combination of both modalities increases performance to approx. 85%.

机译：本文研究了视听相关性和单词突出性的检测。受试者正在通过小型游戏机与计算机进行交互，从而创造了宽广且狭窄的聚焦条件。进行了带有远距离麦克风且没有视觉标记的视听记录。作为声学特征的持续时间，强度，基频和频谱加重进行了计算。从视觉通道中提取头部运动和来自嘴部区域的基于图像变换的特征。首先，结果表明，在两个聚焦条件（宽和窄）下，提取的特征显着不同。根据分类结果证明，可以在不知道单词同一性的情况下将它们区分出来，其准确度大约为。 80％。此外，还表明，视觉通道本身产生的准确度明显好于偶然性（约65％），并且两种模式的组合将性能提高到约50％。 85％。

著录项

来源
《Annual conference of the International Speech Communication Association》|2012年|2387-2390|共4页
会议地点
作者
Martin Heckmann;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
prosody; prominence; visual; audio-visual; spec-tral emphasis; lip movement; head movement;

机译：韵律突出视觉视听频谱重点嘴唇运动头部运动;

相似文献

外文文献
中文文献
专利

1. Audio-visual word prominence detection from clean and noisy speech [J] . Martin Heckmann Computer speech and language . 2018,第MARa期

机译：从干净嘈杂的语音中检测视听单词突出
2. Problem detection in human-machine interactions based on facial expressions of users [J] . Pashiera Barkhuysen, Emiel Krahmer, Marc Swerts Speech Communication . 2005,第3期

机译：基于用户面部表情的人机交互中的问题检测
3. Discovering joint audio-visual codewords for video event detection [J] . I-Hong Jhuo, Guangnan Ye, Shenghua Gao, Machine Vision and Applications . 2014,第1期

机译：发现用于视频事件检测的联合视听代码字
4. Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario [C] . Martin Heckmann INTERSPEECH 2012 . 2012

机译：人机交互情景中的视听评估和检测词语突出
5. Loanwords, prominence and the basis for Mongolian vowel harmony. [D] . Puthuval, Sarala. 2013

机译：外来语，重要性和蒙古元音和谐的基础。
6. Quantitative Evaluation of Visual Aesthetics of Human-Machine Interaction Interface Layout [O] . Li Deng, Guohua Wang 2020

机译：人机交互界面布局视觉美学的量化评估
7. Visual Contribution to Word Prominence Detection in a Playful Interaction Setting [O] . Martin Heckmann 2013

机译：有趣的互动设置中对单词突出检测的视觉贡献
8. Evaluation of graphite/steam interactions for ITER accident scenarios. [R] . Smolik, G. R., Merrill, B. J., Piet, S. J., 1990

机译：ITER事故情景下石墨/蒸汽相互作用的评估。

Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario

摘要

著录项

相似文献

相关主题

期刊订阅