首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals
【24h】

Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals

机译:使用具有噪声信号训练的深卷积网络的多扬声器DOA估计

获取原文
获取原文并翻译 | 示例

摘要

Supervised learning-based methods for source localization, being data driven, can he adapted to different acoustic conditions via training and have been shown to be robust to adverse acoustic environments. In this paper, a convolutional neural network (CNN) based supervised learning method for estimating the direction of arrival (DOA) of multiple speakers is proposed. Multi-speaker DOA estimation is formulated as a multi-class multi-label classification problem, where the assignment of each DOA label to the input feature is treated as a separate binary classification problem. The phase component of the short-time Fourier transform (STFT) coefficients of the received microphone signals are directly fed into the CNN, and the features for DOA estimation are learnt during training. Utilizing the assumption of disjoint speaker activity in the STFT domain, a novel method is proposed to train the CNN with synthesized noise signals. Through experimental evaluation with both simulated and measured acoustic impulse responses, the ability of the proposed DOA estimation approach to adapt to unseen acoustic conditions and its robustness to unseen noise type is demonstrated. Through additional empirical investigation, it is also shown that with an array of M microphone our proposed framework yields the best localization performance with M-1 convolution layers. The ability of the proposed method to accurately localize speakers in a dynamic acoustic scenario with varying number of sources is also shown.
机译:监督基于学习的源本地化方法,是数据驱动的,他可以通过训练适应不同的声学条件,并且已被证明是对不利声学环境的强大。本文提出了一种基于卷积神经网络(CNN)的用于估计多个扬声器的到达方向(DOA)的监督学习方法。多扬声器DOA估计被制定为多类多标签分类问题,其中每个DOA标签的分配给输入功能被视为单独的二进制分类问题。接收麦克风信号的短时傅里叶变换(STFT)系数的相位分量直接进入CNN,并且在训练期间了解到DOA估计的特征。利用STFT结构域中的不相交扬声器活动的假设,提出了一种新方法来训练具有合成噪声信号的CNN。通过模拟和测量的声学脉冲响应的实验评估,所提出的DOA估计方法适应看不见的声学条件及其对看不见的噪声类型的鲁棒性的能力。通过额外的经验研究,还表明,使用M麦克风阵列,我们所提出的框架可以使用M-1卷积层产生最佳的本地化性能。还示出了所提出的方法在具有不同数量的源的动态声学场景中准确地定位扬声器的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号