...
首页> 外文期刊>Advanced Robotics: The International Journal of the Robotics Society of Japan >Improved binaural sound localization and tracking for unknown time-varying number of speakers
【24h】

Improved binaural sound localization and tracking for unknown time-varying number of speakers

机译:改进的双耳声音定位和跟踪功能,可针对未知的随时间变化的扬声器数量

获取原文
获取原文并翻译 | 示例
           

摘要

A method based on the generalized cross-correlation (GCC) method weighted by the phase transform (PHAT) has been developed for binaural sound source localization (SSL) and tracking of multiple sound sources. Accurate binaural audition is important for applying inexpensive and widely applicable auditory capabilities to robots and systems. Conventional SSL based on the GCC-PHAT method is degraded by low resolution of the time difference of arrival estimation, by the interference created when the sound waves arrive at a microphone from two directions around the robot head, and by impaired performance when there are multiple speakers. The low-resolution problem is solved by using a maximum-likelihood-based SSL method in the frequency domain. The multipath interference problem is avoided by incorporating a new time delay factor into the GCC-PHAT method with assuming a spherical robot head. The performance when there are multiple speakers was improved by using a multisource speech tracking method consisting of voice activity detection (VAD) and K-means clustering. The standard K-means clustering algorithm was extended to enable tracking of an unknown time-varying number of speakers by adding two additional steps that increase the number of clusters automatically and eliminate clusters containing incorrect direction estimations. Experiments conducted on the SIG-2 humanoid robot show that this method outperforms the conventional SSL method; it reduces localization errors by 18.1° on average and by over 37° in the side directions. It also tracks multiple speakers in real time with tracking errors below 4.35°.
机译:已经开发了一种基于广义互相关(GCC)方法并由相位变换(PHAT)加权的方法,用于双耳声源定位(SSL)和跟踪多个声源。准确的双耳试听对于将廉价且广泛适用的听觉功能应用于机器人和系统非常重要。传统的基于GCC-PHAT方法的SSL会由于到达估计时间差的低分辨率,当声波从机器人头部周围的两个方向到达麦克风时产生的干扰以及存在多个时的性能下降而降低扬声器。通过在频域中使用基于最大似然的SSL方法解决了低分辨率问题。通过在假设球形机器人头的情况下将新的时间延迟因子合并到GCC-PHAT方法中,避免了多径干扰问题。通过使用包括语音活动检测(VAD)和K-means聚类在内的多源语音跟踪方法,可以提高存在多个扬声器时的性能。标准的K均值聚类算法得到扩展,可以通过添加两个附加步骤来跟踪未知的随时间变化的说话者数量,这两个步骤会自动增加聚类的数量,并消除包含错误方向估计的聚类。在SIG-2类人机器人上进行的实验表明,该方法优于传统的SSL方法。它可以将定位误差平均降低18.1°,并在侧面方向上降低超过37°。它还可以实时跟踪多个扬声器,跟踪误差低于4.35°。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号