Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

Ivry Amir; Berdugo Baruch; Cohen Israel

首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

【24h】

Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

机译：基于扩散网络的瞬态噪声环境语音活动检测

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We address voice activity detection in acoustic environments of transients and stationary noises, which often occur in real-life scenarios. We exploit unique spatial patterns of speech and non-speech audio frames by independently learning their underlying geometric structure. This process is done through a deep encoder-decoder-based neural network architecture. This structure involves an encoder that maps spectral features with temporal information to their low-dimensional representations, which are generated by applying the diffusion maps method. The encoder feeds a decoder that maps the embedded data hack into the high-dimensional space. A deep neural network, which is trained to separate speech from non-speech frames, is obtained by concatenating the decoder to the encoder, resembling the known diffusion nets architecture. Experimental results show enhanced performance compared to competing voice activity detection methods. The improvement is achieved in both accuracy, robustness, and generalization ability. Our model performs in a real-time manner and can be integrated into audio-based communication systems. We also present a batch algorithm that obtains an even higher accuracy for offline applications.

机译：我们致力于在瞬态和固定噪声的声学环境中进行语音活动检测，这在现实生活中经常发生。通过独立学习它们的基本几何结构，我们可以利用语音和非语音音频帧的独特空间模式。这个过程是通过基于深度编码器-解码器的神经网络架构完成的。该结构涉及一种编码器，该编码器将具有时间信息的频谱特征映射到它们的低维表示，这些维表示是通过应用扩散图方法生成的。编码器提供给解码器，该解码器将嵌入式数据hack映射到高维空间。通过将解码器连接到编码器，获得了经过训练以将语音与非语音帧分离的深度神经网络，类似于已知的扩散网络架构。实验结果表明，与竞争性语音活动检测方法相比，性能得到了增强。准确性，鲁棒性和泛化能力均得到了改善。我们的模型以实时方式执行，并且可以集成到基于音频的通信系统中。我们还提出了一种批处理算法，该算法为离线应用程序提供了更高的准确性。

著录项

来源
《Selected Topics in Signal Processing, IEEE Journal of》 |2019年第2期|254-264|共11页
作者
Ivry Amir; Berdugo Baruch; Cohen Israel;
展开▼
作者单位

Technion Israel Inst Technol Andrew & Erna Viterbi Fac Elect Engn IL-3200003 Haifa Israel;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep learning; diffusion maps; voice activity detection;

机译：深度学习;扩散图;语音活动检测;

相似文献

外文文献
中文文献
专利

1. Robust Voice Activity Detection Based on Concept of Modulation Transfer Function in Noisy Reverberant Environments [J] . Morita Shota, Unoki Masashi, Lu Xugang, Journal of signal processing systems for signal, image, and video technology . 2016,第2期

机译：噪声混响环境中基于调制传递函数的鲁棒语音活动检测
2. Voice Activity Detection in Noisy Environments Based on Double-Combined Fourier Transform and Line Fitting [J] . JinsooPark, WooilKim, David K.Han, ScientificWorldJournal . 2014,第3期

机译：基于双组合傅立叶变换和线路拟合的嘈杂环境中的语音活动检测
3. Voice Activity Detection Using an Improved Unvoiced Feature Normalization Process in Noisy Environments [J] . Chung Kyungyong, Oh Sang Yeob Wireless personal communications: An Internaional Journal . 2016,第3期

机译：在嘈杂环境中使用改进的清音特征归一化过程进行语音活动检测
4. Fuzzy Neural Network with Audio-Visual Data for Voice Activity Detection in Noisy Environments [C] . Gin-Der Wu, Zhen-Wei Zhu International Conference on Intelligent Autonomous Systems . 2018

机译：视听数据的模糊神经网络，用于嘈杂环境中的语音活动检测
5. Delay-based congestion detection and admission control for deployment of IP voice tandems in carrier public switched telephone networks. [D] . Burst, Kenneth Neal. 2003

机译：基于时延的拥塞检测和准入控制，用于在运营商公共交换电话网络中部署IP语音组。
6. Voice Activity Detection in Noisy Environments Based on Double-Combined Fourier Transform and Line Fitting [O] . Jinsoo Park, Wooil Kim, David K. Han, -1

机译：基于双重组合傅里叶变换和线性拟合的嘈杂环境中的语音活动检测
7. Two-Channel-Based Voice Activity Detection for Humanoid Robots in Noisy Home Environments [O] . Hyun-don Kim, Kazunori Komatani, Tetsuya Ogata, 2009

机译：在嘈杂的家庭环境中仿人机器人基于两通道的语音活动检测
8. Boosting Contextual Information for Deep Neural Network Based Voice Activity Detection. [R] . Zhang, X., Wang, D. 2015

机译：基于深度神经网络的语音活动检测提升语境信息。

Voice Activity Detection for Transient Noisy Environment Based on Diffusion Nets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅