首页> 外文期刊>Speech Communication >Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots
【24h】

Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots

机译:利用AV积分和散射理论改进类人机器人对同时语音信号的识别

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a method to improve recognition of three simultaneous speech signals by a humanoid robot equipped with a pair of microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech signal are difficult, because the signal-to-noise ratio is quite low (around -3dB) and noise is not stable due to interfering voices. To improve recognition of three simultaneous speech signals, two key ideas are introduced. One is two-layered audio-visual integration of both name (ID) and location, that is, speech and face recognition, and speech and face localization. The other is acoustical modeling of the humanoid head by scattering theory. Sound sources are separated in real-time by an active direction-pass filter (ADPF), which extracts sounds from a specified direction by using the interaural phase/intensity difference estimated by scattering theory. Since features of separated sounds vary according to the sound direction, multiple direction- and speaker-dependent acoustic models are used. The system integrates ASR results by using the sound direction and speaker information provided by face recognition as well as confidence measure of ASR results to select the best one. The resulting system shows an improvement of about 10% on average against recognition of three simultaneous speech signals, where three speakers were located around the humanoid on a I m radius half circle, one of them being in front of him (angle 0degrees) and the other two being at symmetrical positions (+/-theta) varying by 10degrees steps from 0degrees to 90degrees. (C) 2004 Elsevier B.V. All rights reserved.
机译:本文提出了一种通过配备一对麦克风的人形机器人来提高对三个同时语音信号的识别的方法。在这种情况下,分离的语音信号很难进行声音分离和自动语音识别(ASR),因为信噪比非常低(约-3dB),并且由于语音干扰而导致噪声不稳定。为了改善对三个同时语音信号的识别,引入了两个关键思想。一种是名称和位置(即语音和面部识别)以及语音和面部定位的两层视听集成。另一个是通过散射理论对人形头部进行声学建模。声源由有源方向通过滤波器(ADPF)实时分离,该有源滤波器通过使用散射理论估算的耳间相位/强度差从指定方向提取声音。由于分离的声音的特征会根据声音方向而变化,因此使用了多个与方向和扬声器相关的声学模型。该系统通过使用人脸识别提供的声音方向和说话人信息以及ASR结果的置信度来集成ASR结果,以选择最佳的结果。相对于同时识别三个语音信号,三个扬声器分别位于人形机器人周围半径为1 m的半圆上,其中三个扬声器位于他的前方(角度为0度),而三个扬声器分别位于人形周围,因此所得的系统显示出平均约10%的改善。另外两个处于对称位置(+/-θ),从0度到90度以10度为步长变化。 (C)2004 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号