首页> 外文会议>IEEE International Conference on Robotics & Automation >Audio-visual keyword spotting based on adaptive decision fusion under noisy conditions for human-robot interaction
【24h】

Audio-visual keyword spotting based on adaptive decision fusion under noisy conditions for human-robot interaction

机译:人机交互嘈杂条件下基于自适应决策融合的视听关键词识别

获取原文

摘要

Keyword spotting (KWS) deals with the identification of keywords in unconstrained speech, which is a natural, straightforward and friendly way for human-robot interaction (HRI). Most keyword spotters have the common problem of noise-robustness when applied to real-world environment with dramatically changing noises. Since visual information won't be affected by the acoustic noise, it can be utilized to complementarily improve the noise-robustness. In this paper, a novel audio-visual keyword spotting approach based on adaptive decision fusion under noisy conditions is proposed. In order to accurately represent the appearance and movement of mouth region, an improved local binary pattern from three orthogonal planes (ILBP-TOP) is proposed. Besides, a parallel two-step recognition based on acoustic and visual keyword candidates is conducted and generates corresponding acoustic and visual scores for each keyword candidate. Optimal weights for combining acoustic and visual contributions under diverse noise conditions are generated using a neural network based on reliabilities of the two modalities. Experiments show that our proposed audio-visual keyword spotting based on decision fusion significantly improves the noise robustness and attains better performance than feature fusion based audiovisual spotter. Additionally, ILBP-TOP shows more competitive performance than LBP-TOP.
机译:关键字发现(KWS)处理无限制语音中的关键字识别,这是人机交互(HRI)的自然,直接和友好的方式。当应用于具有急剧变化的噪声的真实环境时,大多数关键字检测器都存在噪声鲁棒性的常见问题。由于视觉信息不会受到声音噪声的影响,因此可以利用它来补充提高噪声的鲁棒性。提出了一种在噪声条件下基于自适应决策融合的视听关键词发现新方法。为了准确表示嘴巴区域的外观和运动,提出了一种改进的来自三个正交平面的局部二值模式(ILBP-TOP)。此外,基于声学和视觉关键词候选者进行并行的两步识别,并为每个关键词候选者产生相应的声学和视觉分数。基于这两种模态的可靠性,使用神经网络生成了在各种噪声条件下组合声学和视觉贡献的最佳权重。实验表明,与基于特征融合的视听发现者相比,我们提出的基于决策融合的视听关键词发现能够显着提高噪声鲁棒性,并获得更好的性能。此外,ILBP-TOP显示出比LBP-TOP更具竞争力的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号