Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection

Zhang Xiao-Lei; Wang DeLiang

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection

【24h】

Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection

机译：为基于深度神经网络的语音活动检测提升上下文信息

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Voice activity detection (VAD) is an important topic in audio signal processing. Contextual information is important for improving the performance of VAD at low signal-to-noise ratios. Here we explore contextual information by machine learning methods at three levels. At the top level, we employ an ensemble learning framework, named multi-resolution stacking (MRS), which is a stack of ensemble classifiers. Each classifier in a building block inputs the concatenation of the predictions of its lower building blocks and the expansion of the raw acoustic feature by a given window (called a resolution). At the middle level, we describe a base classifier in MRS, named boosted deep neural network (bDNN). bDNN first generates multiple base predictions from different contexts of a single frame by only one DNN and then aggregates the base predictions for a better prediction of the frame, and it is different from computationally-expensive boosting methods that train ensembles of classifiers for multiple base predictions. At the bottom level, we employ the multi-resolution cochleagram feature, which incorporates the contextual information by concatenating the cochleagram features at multiple spectrotemporal resolutions. Experimental results show that the MRS-based VAD outperforms other VADs by a considerable margin. Moreover, when trained on a large amount of noise types and a wide range of signal-to-noise ratios, the MRS-based VAD demonstrates surprisingly good generalization performance on unseen test scenarios, approaching the performance with noise-dependent training.

机译：语音活动检测（VAD）是音频信号处理中的重要主题。上下文信息对于在低信噪比下提高VAD性能至关重要。在这里，我们通过三个级别的机器学习方法来探索上下文信息。在顶层，我们采用了集成学习框架，称为多分辨率堆栈（MRS），它是集成分类器的堆栈。构造块中的每个分类器输入其较低构造块的预测的级联以及原始声学特征通过给定窗口（称为分辨率）的扩展。在中间层，我们描述了MRS中的基本分类器，称为增强型深度神经网络（bDNN）。 bDNN首先仅通过一个DNN在单个帧的不同上下文中生成多个基本预测，然后将基本预测进行汇总以更好地预测该帧，这不同于为多个基本预测训练分类器集合的计算昂贵的增强方法。在最底层，我们采用了多分辨率耳蜗图功能，该功能通过将耳蜗图功能以多个光谱时分辨率串联在一起来合并上下文信息。实验结果表明，基于MRS的VAD远远优于其他VAD。此外，当在大量的噪声类型和广泛的信噪比上进行训练时，基于MRS的VAD在看不见的测试场景中表现出令人惊讶的良好泛化性能，并通过依赖于噪声的训练来接近该性能。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2016年第2期|252-264|共13页
作者
Zhang Xiao-Lei; Wang DeLiang;
展开▼
作者单位

Department of Computer Science & Engineering and Center for Cognitive & Brain Sciences, The Ohio State University, Columbus, OH, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Cochleagram; deep neural network; ensemble learning; multi-resolution stacking; noise-independent training; voice activity detection;

机译：耳蜗图;深层神经网络;集成学习;多分辨率堆叠;独立于噪声的训练;语音活动检测;

相似文献

外文文献
中文文献
专利

1. Ensemble of deep neural networks using acoustic environment classification for statistical model-based voice activity detection [J] . Inyoung Hwang, Hyung-Min Park, Joon-Hyuk Chang Computer speech and language . 2016,第Jula期

机译：使用声学环境分类的深度神经网络集成，用于基于统计模型的语音活动检测
2. Network anomaly detection using channel boosted and residual learning based deep convolutional neural network [J] . Chouhan Naveed, Khan Asifullah, Khan Haroon-ur-Rasheed Applied Soft Computing . 2019,第期

机译：网络异常检测使用信道提升和基于剩余学习的深卷积神经网络
3. EVALUATION OF FEATURES FOR VOICE ACTIVITY DETECTION USING DEEP NEURAL NETWORK [J] . SUCI DWIJAYANTI, MASATO MIYOSHI Journal of Theoretical and Applied Information Technology . 2018,第4期

机译：基于深层神经网络的语音活动检测功能评估
4. Two-step Judgment Algorithm for Robust Voice Activity Detection Based on Deep Neural Networks [C] . Haonan Wang 2017 International Conference on Computer Technology, Electronics and Communication . 2017

机译：基于深度神经网络的鲁棒语音活动检测两步判断算法
5. A Face Detection and Recognition System for Color Images Using Neural Networks with Boosting and Deep Learning [D] . Hajiarbabi, Mohammadreza. 2017

机译：基于神经网络的Boosting和深度学习彩色图像人脸检测与识别系统
6. A Novel Deep-Learning-Based Bug Severity Classification Technique Using Convolutional Neural Networks and Random Forest with Boosting [O] . Ashima Kukkar, Rajni Mohana, Anand Nayyar, 2019

机译：基于卷积神经网络和Boosting随机森林的基于深度学习的错误严重性分类新技术
7. DENOISING DEEP NEURAL NETWORKS BASED VOICE ACTIVITY DETECTION [O] . Xiao-lei Zhang, Ji Wu 2016

机译：基于深度神经网络的DENOIsING语音活动检测
8. Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection. [R] . Zhang, X., Wang, D. 2015

机译：基于深度神经网络的语音活动检测提升语境信息。

Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅