首页> 外文期刊>EURASIP journal on audio, speech, and music processing >Binaural sound localization based on deep neural network and affinity propagation clustering in mismatched HRTF condition
【24h】

Binaural sound localization based on deep neural network and affinity propagation clustering in mismatched HRTF condition

机译:基于深度神经网络的双耳声置定位和在不匹配的HRTF条件下的亲和传播聚类

获取原文
           

摘要

Binaural sound source localization is an important and widely used perceptually based method and it has been applied to machine learning studies by many researchers based on head-related transfer function (HRTF). Because the HRTF is closely related to human physiological structure, the HRTFs vary between individuals. Related machine learning studies to date tend to focus on binaural localization in reverberant or noisy environments, or in conditions with multiple simultaneously active sound sources. In contrast, mismatched HRTF condition, in which the HRTFs used to generate the training and test sets are different, is rarely studied. This mismatch leads to a degradation of localization performance. A basic solution to this problem is to introduce more data to improve generalization performance, which requires a lot. However, simply increasing the data volume will result in data-inefficiency. In this paper, we propose a data-efficient method based on deep neural network (DNN) and clustering to improve binaural localization performance in the mismatched HRTF condition. Firstly, we analyze the relationship between binaural cues and the sound source localization with a classification DNN. Different HRTFs are used to generate training and test sets, respectively. On this basis, we study the localization performance of DNN model trained by each training set on different test sets. The result shows that the localization performance of the same model on different test sets is different, while the localization performance of different models on the same test set may be similar. The result also shows a clustering trend. Secondly, different HRTFs are divided into several clusters. Finally, the corresponding HRTFs of each cluster center are selected to generate a new training set and to train a more generalized DNN model. The experimental results show that the proposed method achieves better generalization performance than the baseline methods in the mismatched HRTF condition and has almost equal performance to the DNN trained with a large number of HRTFs, which means the proposed method is data-efficient.
机译:双耳声源定位是一种重要的和广泛使用的感知基于的方法,它已由许多基于头部相关传递函数(HRTF)的研究人员的机器学习研究。因为HRTF与人类生理结构密切相关,所以HRTF在个体之间变化。迄今为止的相关机器学习研究倾向于关注混响或嘈杂环境中的双耳定位,或者在具有多个同时活动声源的条件下。相反,不匹配的HRTF条件,其中用于产生训练和测试集的HRTF是不同的,很少研究。这种不匹配导致定位性能的降低。对此问题的基本解决方案是引入更多数据以提高泛化性能,这需要很多。但是,简单地增加数据量将导致数据效率低下。在本文中,我们提出了一种基于深度神经网络(DNN)和聚类的数据有效方法,以改善在不匹配的HRTF条件下的双耳定位性能。首先,我们通过分类DNN分析双耳与声源定位之间的关系。不同的HRTFS分别用于生成培训和测试集。在此基础上,我们研究了在不同测试集上培训的DNN模型的本地化性能。结果表明,不同测试集上相同模型的本地化性能是不同的,而在同一测试集上的不同模型的定位性能可能是相似的。结果也显示了聚类趋势。其次,不同的HRTF分为几个集群。最后,选择每个群集中心的相应HRTF来生成新的训练集并培训更广泛的DNN模型。实验结果表明,该方法的概括性性能比在不匹配的HRTF条件下的基线方法实现了更好的泛化性能,并且对具有大量HRTF的DNN具有几乎相同的性能,这意味着所提出的方法是数据效率的。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号