首页> 外文会议>Annual conference of the International Speech Communication Association >Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario
【24h】

Audio-visual Evaluation and Detection of Word Prominence in a Human-Machine Interaction Scenario

机译:人机交互场景中视听评估和单词显着性检测

获取原文

摘要

This paper investigates the audio-visual correlates and the detection of word prominence. Subjects were interacting with a computer in a small game which created a broad and a narrow focus condition. Audio-visual recordings with a distant microphone and without visual markers were made. As acoustic features duration, intensity, fundamental frequency and spectral emphasis were calculated. From the visual channel head movements and image transformation based features from the mouth region were extracted. First the results show that the extracted features are significantly different for the two focus conditions (broad and narrow). Based on classification results it is demonstrated that they can be differentiated without knowledge of the word identity with accuracies of approx. 80%. Furthermore, it is shown that the visual channel by itself yields accuracies notably better than chance (approx. 65%) and that a combination of both modalities increases performance to approx. 85%.
机译:本文研究了视听相关性和单词突出性的检测。受试者正在通过小型游戏机与计算机进行交互,从而创造了宽广且狭窄的聚焦条件。进行了带有远距离麦克风且没有视觉标记的视听记录。作为声学特征的持续时间,强度,基频和频谱加重进行了计算。从视觉通道中提取头部运动和来自嘴部区域的基于图像变换的特征。首先,结果表明,在两个聚焦条件(宽和窄)下,提取的特征显着不同。根据分类结果证明,可以在不知道单词同一性的情况下将它们区分出来,其准确度大约为。 80%。此外,还表明,视觉通道本身产生的准确度明显好于偶然性(约65%),并且两种模式的组合将性能提高到约50%。 85%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号