首页> 外文期刊>Pattern recognition letters >Voice activity detection and speaker localization using audiovisual cues
【24h】

Voice activity detection and speaker localization using audiovisual cues

机译:使用视听提示进行语音活动检测和说话人定位

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a multimodal approach to distinguish silence from speech situations, and to identify the location of the active speaker in the latter case. In our approach, a video camera is used to track the faces of the participants, and a microphone array is used to estimate the Sound Source Location (SSL) using the Steered Response Power with the phase transform (SRP-PHAT) method. The audiovisual cues are combined, and two competing Hidden Markov Models (HMMs) are used to detect silence or the presence of a person speaking. If speech is detected, the corresponding HMM also provides the spatio-tem-porally coherent location of the speaker. Experimental results show that incorporating the HMM improves the results over the unimodal SRP-PHAT, and the inclusion of video cues provides even further improvements.
机译:本文提出了一种多模态方法,以区分静音和语音情况,并在后一种情况下确定主动讲话者的位置。在我们的方法中,使用摄像机跟踪参与者的面部,并使用麦克风阵列使用带相位变换的转向响应功率(SRP-PHAT)方法估计声源位置(SSL)。结合了视听提示,并且使用两个竞争性的隐马尔可夫模型(HMM)来检测沉默或说话者的存在。如果检测到语音,则相应的HMM还会提供扬声器的时空相关位置。实验结果表明,与单模态SRP-PHAT相比,合并HMM可以改善结果,而视频提示的包含甚至可以提供进一步的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号