A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks

Jun Du; Yanhui Tu; Li-Rong Dai; Chin-Hui Lee

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks

【24h】

A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks

机译：高分辨率深度神经网络的单通道语音分离的回归方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We propose a novel data-driven approach to single-channel speech separation based on deep neural networks (DNNs) to directly model the highly nonlinear relationship between speech features of a mixed signal containing a target speaker and other interfering speakers. We focus our discussion on a semisupervised mode to separate speech of the target speaker from an unknown interfering speaker, which is more flexible than the conventional supervised mode with known information of both the target and interfering speakers. Two key issues are investigated. First, we propose a DNN architecture with dual outputs of the features of both the target and interfering speakers, which is shown to achieve a better generalization capability than that with output features of only the target speaker. Second, we propose using a set of multiple DNNs, each intending to be signal-noise-dependent (SND), to cope with the difficulty that one single general DNN could not well accommodate all the speaker mixing variabilities at different signal-to-noise ratio (SNR) levels. Experimental results on the speech separation challenge (SSC) data demonstrate that our proposed framework achieves better separation results than other conventional approaches in a supervised or semisupervised mode. SND-DNNs could also yield significant performance improvements over a general DNN for speech separation in low SNR cases. Furthermore, for automatic speech recognition (ASR) following speech separation, this purely front-end processing with a single set of speaker-independent ASR acoustic models, achieves a relative word error rate (WER) reduction of 11.6% over a state-of-the-art separation and recognition system where a complicated joint back-end decoding framework with multiple sets of speaker-dependent ASR acoustic models needs to be implemented. When speaker-adaptive ASR acoustic models for the target speakers are adopted for the enhanced signals, another 12.1% WER reduction over our best speak- r-independent ASR system is achieved.

机译：我们提出了一种基于数据驱动的基于深度神经网络（DNN）的单通道语音分离方法，可以直接对包含目标说话者和其他干扰说话者的混合信号的语音特征之间的高度非线性关系进行建模。我们将讨论的重点放在半监督模式下，以将目标讲话者的语音与未知干扰讲话者分离，这比具有目标和干扰讲话者已知信息的传统监督模式更加灵活。研究了两个关键问题。首先，我们提出了一种具有目标和干扰说话者特征的双重输出的DNN体系结构，与仅具有目标说话者的输出特征相比，它具有更好的泛化能力。其次，我们建议使用一组多个DNN，每个DNN都希望与信号噪声相关（SND），以解决一个单一的通用DNN无法很好地适应不同信噪比下所有扬声器混音变化的难题。比（SNR）级别。关于语音分离挑战（SSC）数据的实验结果表明，我们提出的框架在监督或半监督模式下比其他常规方法可获得更好的分离结果。在低SNR情况下，SND-DNN在语音分离方面也可比普通DNN显着提高性能。此外，对于语音分离后的自动语音识别（ASR），此纯净的前端处理过程具有一组与扬声器无关的ASR声学模型，相对于语音状态，其相对字错误率（WER）降低了11.6％。现有技术的分离和识别系统，其中需要实现一个复杂的联合后端解码框架，其中包含多组与说话者相关的ASR声学模型。当针对目标扬声器的扬声器自适应ASR声学模型用于增强信号时，与我们独立于最佳扬声器的最佳ASR系统相比，WER降低了12.1％。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2016年第8期|1424-1437|共14页
作者
Jun Du; Yanhui Tu; Li-Rong Dai; Chin-Hui Lee;
展开▼
作者单位

National Engineering Laboratory for Speech and Language Information Processing, University of Science and Technology of China, Hefei, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep neural network; divide and conquer; dual outputs; dual outputs, divide and conquer; robust speech recognition; speech separation;

机译：深度神经网络;分而治之;双输出;双输出;分而治之;鲁棒的语音识别;语音分离;

相似文献

外文文献
中文文献
专利

1. A Speaker-Dependent Approach to Single-Channel Joint Speech Separation and Acoustic Modeling Based on Deep Neural Networks for Robust Recognition of Multi-Talker Speech [J] . Yan-Hui Tu, Jun Du, Chin-Hui Lee Journal of signal processing systems for signal, image, and video technology . 2018,第7期

机译：基于说话者的基于深度神经网络的单通道联合语音分离和声学建模方法，用于多语音对话的鲁棒识别
2. Objective Evaluation of a Deep Neural Network Approach for Single-Channel Speech Intelligibility Enhancement [J] . Dongfu Li, Martin Bouchard Computer Science & Information Technology . 2016,第10期

机译：用于单通道语音清晰度增强的深度神经网络方法的客观评估
3. A Regression Approach to Speech Enhancement Based on Deep Neural Networks [J] . Xu Y., Du J., Dai L.-R., Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2015,第1期

机译：基于深度神经网络的语音增强回归方法
4. Unsupervised single-channel speech separation via deep neural network for different gender mixtures [C] . Yannan Wang, Jun Du, Li-Rong Dai, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . 2016

机译：通过深度神经网络的无监督单通道语音分离，适用于不同性别的混合物
5. Supervised speech separation using deep neural networks. [D] . Wang, Yuxuan. 2015

机译：使用深度神经网络监督语音分离。
6. Improving Robustness of Deep Neural Network Acoustic Models via Speech Separation and Joint Adaptive Training [O] . Arun Narayanan, DeLiang Wang -1

机译：通过语音分离和联合自适应训练提高深度神经网络声学模型的鲁棒性
7. Objective Evaluation of a Deep Neural Network Approach for Single-Channel Speech Intelligibility Enhancement [O] . Dongfu Li, Martin Bouchard 2016

机译：单通道语音可懂度增强深度神经网络方法的客观评价

A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅