首页> 外文会议>Annual Conference on Information Sciences and Systems >A temporal saliency map for modeling auditory attention
【24h】

A temporal saliency map for modeling auditory attention

机译:用于建模听觉注意的时间显着图

获取原文

摘要

The auditory system is flooded with information throughout our daily lives. Rather than processing all of this information, we selectively shift our attention to various auditory events - either events of interest (top-down attention) or events that capture our attention exogenously (bottom-up). In this work, we are concerned with aspects of human attention that are bottom-up stimulus-driven. Saliency of an auditory event is measured by how much the event differs from the surrounding sounds that precede it in time. To calculate this, we propose a novel auditory saliency map that is defined only over time. The proposed model is contrasted against previously published auditory saliency maps which treat the two-dimensional auditory time-frequency spectrogram as an image that can be analyzed using visual saliency models. Instead, our proposed model capitalizes on the rich high-dimensional feature space that defines auditory events; where each acoustic dimension is processed across multiple scales. These normalized feature maps are then combined over time into a single temporal saliency map. The peaks of the temporal saliency map indicate the locations of the salient events in the auditory scene. We validate the accuracy of the proposed model in simulated test scenarios of simple and complex sound clips. By exploiting the unique aspects of auditory processing that cannot be readily captured by visual processes, we are able to outperform other auditory saliency models; all while highlighting the commonalities and differences between the two modalities in processing salient events in everyday scenes.
机译:在我们的日常生活中,听觉系统充斥着信息。我们而不是处理所有这些信息,我们选择性地将我们的注意力转移到各种听觉事件 - 感兴趣的事件(自上而下的注意)或捕捉我们引起的事件(自下而上)。在这项工作中,我们关注人类注意的方面,即自下而上刺激驱动。听觉事件的显着性是通过与周围的声音不同的情况来衡量的。要计算出来,我们提出了一种仅限时间定义的新型听觉​​显着性图。所提出的模型与先前公布的听觉显着性图形成鲜明对比,其将二维听觉时频谱图视为可以使用视觉显着模型分析的图像。相反,我们的拟议模型大写了定义听觉事件的丰富的高维特征空间;在多个尺度上处理每个声尺寸的地方。然后将这些归一化特征贴图随时间结合到单个时间显着图中。时间显着图的峰值表示听觉场景中的突出事件的位置。我们验证了简单和复杂的声音剪辑的模拟测试场景中提出模型的准确性。通过利用可通过视觉过程易于捕获的听觉处理的独特方面,我们能够优于其他听效刻;虽然突出了在日常场景中处理突出事件中的两个方式之间的共同点和差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号