首页> 外文期刊>Computer speech and language >Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds
【24h】

Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds

机译:客厅中的语音识别:基于声音的空间,频谱和时间建模的集成语音增强和识别系统

获取原文
获取原文并翻译 | 示例
       

摘要

Research on noise robust speech recognition has mainly focused on dealing with relatively stationary noise that may differ from the noise conditions in most living environments. In this paper, we introduce a recognition system that can recognize speech in the presence of multiple rapidly time-varying noise sources as found in a typical family living room. To deal with such severe noise conditions, our recognition system exploits all available information about speech and noise; that is spatial (directional), spectral and temporal information. This is realized with a model-based speech enhancement pre-processor, which consists of two complementary elements, a multi-channel speech-noise separation method that exploits spatial and spectral information, followed by a single channel enhancement algorithm that uses the long-term temporal characteristics of speech obtained from clean speech examples. Moreover, to compensate for any mismatch that may remain between the enhanced speech and the acoustic model, our system employs an adaptation technique that combines conventional maximum likelihood linear regression with the dynamic adaptive compensation of the variance of the Gaussians of the acoustic model. Our proposed system approaches human performance levels by greatly improving the audible quality of speech and substantially improving the keyword recognition accuracy.
机译:噪声鲁棒语音识别的研究主要集中在处理相对固定的噪声,该噪声可能不同于大多数生活环境中的噪声条件。在本文中,我们介绍了一种识别系统,该系统可以在典型家庭客厅中发现的多个快速时变噪声源存在下识别语音。为了应对这种严重的噪声状况,我们的识别系统会利用有关语音和噪声的所有可用信息;即空间(方向),频谱和时间信息。这是通过基于模型的语音增强预处理器实现的,该预处理器由两个互补元素组成,一种是利用空间和频谱信息的多通道语音噪声分离方法,然后是一种使用长期的单通道增强算法。从干净的语音示例中获得的语​​音的时间特性。此外,为了补偿增强语音和声学模型之间可能存在的任何失配,我们的系统采用了一种自适应技术,该技术将常规最大似然线性回归与声学模型高斯方差的动态自适应补偿相结合。我们提出的系统通过大大提高语音的可听质量并大大提高了关键字识别的准确性,从而达到了人类的表现水平。

著录项

  • 来源
    《Computer speech and language》 |2013年第3期|851-873|共23页
  • 作者单位

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan,NTT Communication Science Laboratories, Media Information Laboratory, Processing Research Group, 2-4, Hikaridai, Seika-cho, Keihanna Science City, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

    NTT Communication Science Laboratories, NTT Corporation, 2-4, Hikaridai Seika-cho, Souraku-gun, Kyoto 619-0237, Japan;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    robust asr; model-based speech enhancement; example-based speech enhancement; model adaptation; dynamic variance adaptation;

    机译:健壮的asr;基于模型的语音增强;基于实例的语音增强;模型适应;动态方差适应;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号