Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

Marc Delcroix; Keisuke Kinoshita; Tomohiro Nakatani; Shoko Araki; Atsunori Ogawa; Takaaki Hori; Shinji Watanabe; Masakiyo Fujimoto; Takuya Yoshioka; Takanobu Oba; Yotaro Kubo; Mehrez Souden; Seong-Jun Hahm; Atsushi Nakamura

首页> 外文期刊>Computer speech and language >Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

【24h】

Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

机译：客厅中的语音识别：基于声音的空间，频谱和时间建模的集成语音增强和识别系统

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Research on noise robust speech recognition has mainly focused on dealing with relatively stationary noise that may differ from the noise conditions in most living environments. In this paper, we introduce a recognition system that can recognize speech in the presence of multiple rapidly time-varying noise sources as found in a typical family living room. To deal with such severe noise conditions, our recognition system exploits all available information about speech and noise; that is spatial (directional), spectral and temporal information. This is realized with a model-based speech enhancement pre-processor, which consists of two complementary elements, a multi-channel speech-noise separation method that exploits spatial and spectral information, followed by a single channel enhancement algorithm that uses the long-term temporal characteristics of speech obtained from clean speech examples. Moreover, to compensate for any mismatch that may remain between the enhanced speech and the acoustic model, our system employs an adaptation technique that combines conventional maximum likelihood linear regression with the dynamic adaptive compensation of the variance of the Gaussians of the acoustic model. Our proposed system approaches human performance levels by greatly improving the audible quality of speech and substantially improving the keyword recognition accuracy.

机译：噪声鲁棒语音识别的研究主要集中在处理相对固定的噪声，该噪声可能不同于大多数生活环境中的噪声条件。在本文中，我们介绍了一种识别系统，该系统可以在典型家庭客厅中发现的多个快速时变噪声源存在下识别语音。为了应对这种严重的噪声状况，我们的识别系统会利用有关语音和噪声的所有可用信息；即空间（方向），频谱和时间信息。这是通过基于模型的语音增强预处理器实现的，该预处理器由两个互补元素组成，一种是利用空间和频谱信息的多通道语音噪声分离方法，然后是一种使用长期的单通道增强算法。从干净的语音示例中获得的语音的时间特性。此外，为了补偿增强语音和声学模型之间可能存在的任何失配，我们的系统采用了一种自适应技术，该技术将常规最大似然线性回归与声学模型高斯方差的动态自适应补偿相结合。我们提出的系统通过大大提高语音的可听质量并大大提高了关键字识别的准确性，从而达到了人类的表现水平。

著录项

来源
《Computer speech and language》 |2013年第3期|851-873|共23页
作者
Marc Delcroix; Keisuke Kinoshita; Tomohiro Nakatani; Shoko Araki; Atsunori Ogawa; Takaaki Hori; Shinji Watanabe; Masakiyo Fujimoto; Takuya Yoshioka; Takanobu Oba; Yotaro Kubo; Mehrez Souden; Seong-Jun Hahm; Atsushi Nakamura;
展开▼
作者单位

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan,NTT Communication Science Laboratories, Media Information Laboratory, Processing Research Group, 2-4, Hikaridai, Seika-cho, Keihanna Science City, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
robust asr; model-based speech enhancement; example-based speech enhancement; model adaptation; dynamic variance adaptation;

机译：健壮的asr;基于模型的语音增强;基于实例的语音增强;模型适应;动态方差适应;

相似文献

外文文献
中文文献
专利

1. Combination of GMM-Based Speech Estimation Method and Temporal Domain SVD-Based Speech Enhancement for Noise Robust Speech Recognition [J] . Masakiyo Fujimoto, Yasuo Ariki Systems and Computers in Japan . 2007,第3期

机译：基于GMM的语音估计方法与基于时域SVD的语音增强相结合的噪声鲁棒语音识别
2. Spectral difference for statistical model-based speech enhancement in speech recognition [J] . Lee Soojeong, Chang Joon-Hyuk Multimedia Tools and Applications . 2017,第23期

机译：语音识别中基于统计模型的语音增强的频谱差异
3. Comparison of Performance of Enhanced Morpheme-based Language Model with Different Word-based Language Models for Improving the Performance of Tamil Speech Recognition System [J] . S. SARASWATHI, T.V. GEETHA ACM transactions on Asian language information processing . 2007,第3期

机译：增强的基于词素的语言模型与不同的基于单词的语言模型的性能比较，以提高泰米尔语语音识别系统的性能
4. Speech enhancement based on spectral subtraction for speech recognition system [C] . Jung-woo Han, Se-Young Kim, Ki-man Kim, 2011 IEEE International Conference on Consumer Electronics . 2011

机译：基于谱减的语音识别系统语音增强
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Enhancing Speech Recognition Using Improved Particle Swarm Optimization Based Hidden Markov Model [O] . Lokesh Selvaraj, Balakrishnan Ganesan -1

机译：基于隐马尔可夫模型的改进粒子群算法增强语音识别
7. Complete sound and speech recognition system for health smart homes: Application to the recognition of activities of daily living [O] . Michel Vacher, Anthony Fleury, François Portet, 2012

机译：完善的健康智能家居语音识别系统：应用于日常生活活动识别

Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅