...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >A Generalized Nonnegative Tensor Factorization Approach for Distant Speech Recognition With Distributed Microphones
【24h】

A Generalized Nonnegative Tensor Factorization Approach for Distant Speech Recognition With Distributed Microphones

机译:分布式麦克风远程语音识别的广义非负张量分解方法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Automatic speech recognition (ASR) using distant (far-field) microphones is a challenging task, in which room reverberation is one of the primary causes of performance degradation. This study proposes a multichannel spectral enhancement method for reverberation-robust ASR using distributed microphones. The proposed method uses the techniques of nonnegative tensor factorization in order to identify the clean speech component from a set of observed reverberant spectrograms from the different channels. The general family of alpha–beta divergences is used for the tensor decomposition task which provides increased flexibility for the algorithm and is shown to provide improvements in highly reverberant scenarios. Unlike many conventional array processing solutions, the proposed method does not require closely-spaced microphones and is independent of source and microphone locations. The algorithm can automatically adapt to unbalanced direct-to-reverberation ratios among different channels, which is useful in blind scenarios in which no information is available about source-to-microphone distances. For a medium vocabulary distant ASR task based on TIMIT utterances, and using clean-trained deep neural network acoustic models, absolute WER improvements of +17.2%, +20.7%, and +23.2% are achieved in single-channel, two-channel, and four-channel scenarios.
机译:使用远距离(远场)麦克风的自动语音识别(ASR)是一项具有挑战性的任务,其中,房间混响是性能下降的主要原因之一。这项研究提出了一种多声道频谱增强方法,用于使用分布式麦克风的混响鲁棒ASR。所提出的方法使用非负张量分解技术,以便从来自不同通道的一组观察到的混响频谱图中识别干净的语音成分。一般的α-β散度家族用于张量分解任务,这为算法提供了更大的灵活性,并显示出在高混响情况下的改进。与许多传统的阵列处理解决方案不同,所提出的方法不需要间隔很小的麦克风,并且与源和麦克风位置无关。该算法可以自动适应不同通道之间不平衡的直接混响比,这在盲人场景中非常有用,在盲人场景中,没有有关信源到麦克风距离的信息。对于基于TIMIT话语的中等词汇量远距离ASR任务,并使用干净训练的深度神经网络声学模型,单通道,两通道,和四通道方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号