Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots

Nakadai K; Matsuura D; Okuno HG; Tsujino H

首页> 外文期刊>Speech Communication >Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots

【24h】

Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots

机译：利用AV积分和散射理论改进类人机器人对同时语音信号的识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a method to improve recognition of three simultaneous speech signals by a humanoid robot equipped with a pair of microphones. In such situations, sound separation and automatic speech recognition (ASR) of the separated speech signal are difficult, because the signal-to-noise ratio is quite low (around -3dB) and noise is not stable due to interfering voices. To improve recognition of three simultaneous speech signals, two key ideas are introduced. One is two-layered audio-visual integration of both name (ID) and location, that is, speech and face recognition, and speech and face localization. The other is acoustical modeling of the humanoid head by scattering theory. Sound sources are separated in real-time by an active direction-pass filter (ADPF), which extracts sounds from a specified direction by using the interaural phase/intensity difference estimated by scattering theory. Since features of separated sounds vary according to the sound direction, multiple direction- and speaker-dependent acoustic models are used. The system integrates ASR results by using the sound direction and speaker information provided by face recognition as well as confidence measure of ASR results to select the best one. The resulting system shows an improvement of about 10% on average against recognition of three simultaneous speech signals, where three speakers were located around the humanoid on a I m radius half circle, one of them being in front of him (angle 0degrees) and the other two being at symmetrical positions (+/-theta) varying by 10degrees steps from 0degrees to 90degrees. (C) 2004 Elsevier B.V. All rights reserved.

机译：本文提出了一种通过配备一对麦克风的人形机器人来提高对三个同时语音信号的识别的方法。在这种情况下，分离的语音信号很难进行声音分离和自动语音识别（ASR），因为信噪比非常低（约-3dB），并且由于语音干扰而导致噪声不稳定。为了改善对三个同时语音信号的识别，引入了两个关键思想。一种是名称和位置（即语音和面部识别）以及语音和面部定位的两层视听集成。另一个是通过散射理论对人形头部进行声学建模。声源由有源方向通过滤波器（ADPF）实时分离，该有源滤波器通过使用散射理论估算的耳间相位/强度差从指定方向提取声音。由于分离的声音的特征会根据声音方向而变化，因此使用了多个与方向和扬声器相关的声学模型。该系统通过使用人脸识别提供的声音方向和说话人信息以及ASR结果的置信度来集成ASR结果，以选择最佳的结果。相对于同时识别三个语音信号，三个扬声器分别位于人形机器人周围半径为1 m的半圆上，其中三个扬声器位于他的前方（角度为0度），而三个扬声器分别位于人形周围，因此所得的系统显示出平均约10％的改善。另外两个处于对称位置（+/-θ），从0度到90度以10度为步长变化。（C）2004 Elsevier B.V.保留所有权利。

著录项

来源
《Speech Communication》 |2004年第4期|p. 97-112|共16页
作者
Nakadai K; Matsuura D; Okuno HG; Tsujino H;
展开▼
作者单位

Honda Res Inst Japan Co Ltd, Wako, Saitama 3510114, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类语言、文字;
关键词
audio-visual integration; robot audition; scattering theory; sound source localization; sound source separation; speech recognition; active audition;

机译：视听集成;机器人试听;散射理论;声源定位;声源分离;语音识别;主动试听;

相似文献

外文文献
中文文献
专利

1. Kinect microphone array-based speech and speaker recognition for the exhibition control of humanoid robots [J] . Ding Ing-Jr, Shi Jia-Yi Computers and Electrical Engineering . 2017,第期

机译：基于Kinect麦克风阵列的语音和扬声器识别人形机器人的展览控制
2. Zero-crossing-based speech segregation and recognition for humanoid robots [J] . Sung Jun An, Rhee Man Kil, Young-Ik Kim Consumer Electronics, IEEE Transactions on . 2009,第4期

机译：基于零交叉的人形机器人语音分离与识别
3. Integration of Speech and Action in Humanoid Robots: iCub Simulation Experiments [J] . Autonomous Mental Development, IEEE Transactions on . 2011,第1期

机译：人形机器人中语音和动作的集成：iCub仿真实验
4. Missing-Feature based Speech Recognition for Two Simultaneous Speech Signals Separated by ICA with a pair of Humanoid Ears [C] . Ryu Takeda, Shunichi Yamamoto, Kazunori Komatani, IEEE/RSJ International Conference on Intelligent Robots and Systems . 2006

机译：基于ICA分开的两个同时语音信号的语音识别与一对人形耳朵分开
5. Integrating gesture recognition and speech recognition in a touch-less human computer interaction system. [D] . Purkayastha, Bhaskar. 2009

机译：在非接触式人机交互系统中集成手势识别和语音识别。
6. Integration of Industrially-Oriented Human-Robot Speech Communication and Vision-Based Object Recognition [O] . Adam Rogowski, Krzysztof Bieliszczuk, Jerzy Rapcewicz 2020

机译：整合工业导向人体机器人语音通信和基于视觉的对象识别
7. Integration of speech and action in humanoid robots: icub simulation experiments [O] . Vadim Tikhanoff, Angelo Cangelosi, Giorgio Metta 2011

机译：人形机器人中语音和动作的整合：icub模拟实验
8. Open Object Recognition for Humanoid Robots [R] . Fitzpatrick, P. 2003

机译：人形机器人的开放式目标识别

Improvement of recognition of simultaneous speech signals using AV integration and scattering theory for humanoid robots

摘要

著录项

相似文献

相关主题

期刊订阅