A Generalized Nonnegative Tensor Factorization Approach for Distant Speech Recognition With Distributed Microphones

Seyedmahdad Mirsamadi; John H. L. Hansen

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >A Generalized Nonnegative Tensor Factorization Approach for Distant Speech Recognition With Distributed Microphones

【24h】

A Generalized Nonnegative Tensor Factorization Approach for Distant Speech Recognition With Distributed Microphones

机译：分布式麦克风远程语音识别的广义非负张量分解方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Automatic speech recognition (ASR) using distant (far-field) microphones is a challenging task, in which room reverberation is one of the primary causes of performance degradation. This study proposes a multichannel spectral enhancement method for reverberation-robust ASR using distributed microphones. The proposed method uses the techniques of nonnegative tensor factorization in order to identify the clean speech component from a set of observed reverberant spectrograms from the different channels. The general family of alpha–beta divergences is used for the tensor decomposition task which provides increased flexibility for the algorithm and is shown to provide improvements in highly reverberant scenarios. Unlike many conventional array processing solutions, the proposed method does not require closely-spaced microphones and is independent of source and microphone locations. The algorithm can automatically adapt to unbalanced direct-to-reverberation ratios among different channels, which is useful in blind scenarios in which no information is available about source-to-microphone distances. For a medium vocabulary distant ASR task based on TIMIT utterances, and using clean-trained deep neural network acoustic models, absolute WER improvements of +17.2%, +20.7%, and +23.2% are achieved in single-channel, two-channel, and four-channel scenarios.

机译：使用远距离（远场）麦克风的自动语音识别（ASR）是一项具有挑战性的任务，其中，房间混响是性能下降的主要原因之一。这项研究提出了一种多声道频谱增强方法，用于使用分布式麦克风的混响鲁棒ASR。所提出的方法使用非负张量分解技术，以便从来自不同通道的一组观察到的混响频谱图中识别干净的语音成分。一般的α-β散度家族用于张量分解任务，这为算法提供了更大的灵活性，并显示出在高混响情况下的改进。与许多传统的阵列处理解决方案不同，所提出的方法不需要间隔很小的麦克风，并且与源和麦克风位置无关。该算法可以自动适应不同通道之间不平衡的直接混响比，这在盲人场景中非常有用，在盲人场景中，没有有关信源到麦克风距离的信息。对于基于TIMIT话语的中等词汇量远距离ASR任务，并使用干净训练的深度神经网络声学模型，单通道，两通道，和四通道方案。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2016年第10期|1721-1731|共11页
作者
Seyedmahdad Mirsamadi; John H. L. Hansen;
展开▼
作者单位

Center for Robust Speech Systems, The University of Texas at Dallas, Richardson, TX, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Distant speech recognition; distributed far-field microphones; nonnegative tensor factorization; reverberation;

机译：远距离语音识别;分布式远场麦克风;负张量分解;混响;

相似文献

外文文献
中文文献
专利

1. Generalized Discriminant Orthogonal Nonnegative Tensor Factorization for Facial Expression Recognition [J] . ZhangXiuJun, LiuChang ScientificWorldJournal . 2014,第3期

机译：面部表情识别的广义判别正交非负面张解张解因素
2. A hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources [J] . Ning Ma, Jon Barker, Heidi Christensen, Computer speech and language . 2013,第3期

机译：在多种情况下，以听觉为灵感的远距离麦克风语音识别方法
3. Microphone Array Processing for Distant Speech Recognition: From Close-Talking Microphones to Far-Field Sensors [J] . Kumatani K., Mcdonough J., Raj B. Signal Processing Magazine, IEEE . 2012,第6期

机译：远距离语音识别的麦克风阵列处理：从近距离麦克风到远场传感器
4. Multichannel feature enhancement in distributed microphone arrays for robust distant speech recognition in smart rooms [C] . Mirsamadi Seyedmahdad, Hansen John H. L. IEEE Workshop on Spoken Language Technology . 2014

机译：分布式麦克风阵列中的多通道功能增强，可在智能房间中实现可靠的远距离语音识别
5. Nonnegative matrix and tensor factorizations, least squares problems, and applications. [D] . Kim, Jingu. 2011

机译：非负矩阵和张量分解，最小二乘问题及其应用。
6. Generalized Discriminant Orthogonal Nonnegative Tensor Factorization for Facial Expression Recognition [O] . Zhang XiuJun, Liu Chang -1

机译：表情识别的广义判别正交非负张量分解
7. Generalized Discriminant Orthogonal Nonnegative Tensor Factorization for Facial Expression Recognition [O] . Zhang XiuJun, Liu Chang 2014

机译：面部表情识别的广义判别正交非负面张解张解因素

A Generalized Nonnegative Tensor Factorization Approach for Distant Speech Recognition With Distributed Microphones

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅