An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech

Tu Yanhui; Du Jun; Wang Qing; Bao Xiao; Dai Li Rong; Lee Chinhui

首页> 外文期刊>Computer speech and language >An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech

【24h】

An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech

机译：具有多通道特征串联和多视角系统组合的信息融合框架，用于基于深度学习的麦克风阵列语音鲁棒识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present an information fusion approach to the robust recognition of multi-microphone speech. It is based on a deep learning framework with a large deep neural network (DNN) consisting of subnets designed from different perspectives. Multiple knowledge sources are then reasonably integrated via an early fusion of normalized noisy features with multiple beamforming techniques, enhanced speech features, speaker-related features, and other auxiliary features concatenated as the input to each subnet to compensate for imperfect front-end processing. Furthermore, a late fusion strategy is utilized to leverage the complementary natures of the different subnets by combining the outputs of all subnets to produce a single output set. Testing on the CHiME-3 task of recognizing microphone array speech, we demonstrate in our empirical study that the different information sources complement each other and that both early and late fusions provide significant performance gains, with an overall word error rate of 10.55% when combining 12 systems. Furthermore, by utilizing an improved technique for beamforming and a powerful recurrent neural network (RNN)-based language model for rescoring, a WER of 9.08% can be achieved for the best single DNN system with one-pass decoding among all of the systems submitted to the CHiME-3 challenge.

机译：我们提出了一种信息融合方法，可以对多麦克风语音进行可靠的识别。它基于具有大型深度神经网络（DNN）的深度学习框架，该深度神经网络由从不同角度设计的子网组成。然后，通过将归一化的噪声特征与多种波束成形技术，增强的语音特征，与说话者相关的特征以及其他辅助特征进行早期融合，合理地整合多个知识源，作为每个子网的输入，以补偿不完善的前端处理。此外，后期融合策略用于通过组合所有子网的输出以生成单个输出集来利用不同子网的互补性质。通过对CHiME-3识别麦克风阵列语音的任务进行测试，我们在实证研究中证明，不同的信息源可以相互补充，并且早期和晚期融合都可以显着提高性能，结合时总字错误率达到10.55％ 12个系统。此外，通过利用改进的波束成形技术和基于强大的递归神经网络（RNN）的语言模型进行记录，对于所有提交的系统中具有一遍解码的最佳单DNN系统，WER可以达到9.08％。应对CHiME-3挑战。

著录项

来源
《Computer speech and language》 |2017年第11期|517-534|共18页
作者
Tu Yanhui; Du Jun; Wang Qing; Bao Xiao; Dai Li Rong; Lee Chinhui;
展开▼
作者单位

University of Science and Technology of China, Hefei, Anhui, China;

University of Science and Technology of China, Hefei, Anhui, China;

University of Science and Technology of China, Hefei, Anhui, China;

University of Science and Technology of China, Hefei, Anhui, China;

University of Science and Technology of China, Hefei, Anhui, China;

Georgia Institute of Technology, Atlanta, GA, United States;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
CHiME challenge; Deep learning; Information fusion; Microphone array; Robust speech recognition;

机译：CHiME挑战;深度学习;信息融合;麦克风阵列;强大的语音识别;
入库时间 2022-08-18 02:11:09

相似文献

外文文献
中文文献
专利

1. Robust Distant Speech Recognition by Combining Multiple Microphone-Array Processing with Position-Dependent CMN [J] . Longbiao Wang, Norihide Kitaoka, Seiichi Nakagawa EURASIP journal on applied signal processing . 2006,第20期

机译：通过将多个麦克风阵列处理与位置相关的CMN相结合，实现鲁棒的远程语音识别
2. Robust Distant Speech Recognition by Combining Multiple Microphone-Array Processing with Position-Dependent CMN [J] . Longbiao Wang, Norihide Kitaoka, Seiichi Nakagawa EURASIP journal on advances in signal processing . 2006,第1期

机译：通过将多个麦克风阵列处理与位置相关的CMN相结合，实现鲁棒的远程语音识别
3. Space discriminative function for microphone array robust speech recognition [J] . Zhao Xianyu, Ou Zhijian, Wang Zuoying High Technology Letters . 2005,第4期

机译：麦克风阵列鲁棒语音识别的空间判别功能
4. Multichannel feature enhancement in distributed microphone arrays for robust distant speech recognition in smart rooms [C] . Mirsamadi Seyedmahdad, Hansen John H. L. IEEE Workshop on Spoken Language Technology . 2014

机译：分布式麦克风阵列中的多通道功能增强，可在智能房间中实现可靠的远距离语音识别
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition [O] . Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali -1

机译：在带通滤波调制多流功能根据框架鲁棒语音识别
7. Increasing Robustness In GMM Speaker Recognition Systems For Noisy And Reverberant Speech With Low Complexity Microphone Arrays [O] . Joaquín González-rodríguez, Javier Ortega-garcía, César Martín, 1996

机译：增强Gmm扬声器识别系统的稳健性，用于低复杂度麦克风阵列的嘈杂和混响语音

An information fusion framework with multi-channel feature concatenation and multi-perspective system combination for the deep-learning-based robust recognition of microphone array speech

摘要

著录项

相似文献

相关主题

期刊订阅