首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals
【24h】

Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals

机译:使用受噪声信号训练的深度卷积网络进行多扬声器DOA估计

获取原文
获取原文并翻译 | 示例

摘要

Supervised learning-based methods for source localization, being data driven, can he adapted to different acoustic conditions via training and have been shown to be robust to adverse acoustic environments. In this paper, a convolutional neural network (CNN) based supervised learning method for estimating the direction of arrival (DOA) of multiple speakers is proposed. Multi-speaker DOA estimation is formulated as a multi-class multi-label classification problem, where the assignment of each DOA label to the input feature is treated as a separate binary classification problem. The phase component of the short-time Fourier transform (STFT) coefficients of the received microphone signals are directly fed into the CNN, and the features for DOA estimation are learnt during training. Utilizing the assumption of disjoint speaker activity in the STFT domain, a novel method is proposed to train the CNN with synthesized noise signals. Through experimental evaluation with both simulated and measured acoustic impulse responses, the ability of the proposed DOA estimation approach to adapt to unseen acoustic conditions and its robustness to unseen noise type is demonstrated. Through additional empirical investigation, it is also shown that with an array of M microphone our proposed framework yields the best localization performance with M-1 convolution layers. The ability of the proposed method to accurately localize speakers in a dynamic acoustic scenario with varying number of sources is also shown.
机译:在数据驱动下,基于监督的基于学习的源定位方法可以通过训练适应不同的声学条件,并且已经证明对不利的声学环境具有鲁棒性。本文提出了一种基于卷积神经网络(CNN)的监督学习方法,用于估计多个说话者的到达方向(DOA)。多扬声器DOA估计被公式化为多类多标签分类问题,其中将每个DOA标签到输入特征的分配视为一个单独的二进制分类问题。接收到的麦克风信号的短时傅立叶变换(STFT)系数的相位分量直接馈入CNN,并在训练过程中学习DOA估计的特征。利用STFT域中说话人活动不相交的假设,提出了一种用合成噪声信号训练CNN的新方法。通过对模拟和测量的声脉冲响应进行实验评估,证明了所提出的DOA估计方法能够适应看不见的声学条件,并具有针对看不见的噪声类型的鲁棒性。通过其他的实证研究,还表明,使用M个麦克风阵列,我们提出的框架在M-1卷积层上可产生最佳的定位性能。还显示了所提出方法在具有变化数量的源的动态声学场景中准确定位扬声器的能力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号