首页> 外文会议>Annual Conference on Information Sciences and Systems;CISS >A temporal saliency map for modeling auditory attention
【24h】

A temporal saliency map for modeling auditory attention

机译:用于建模听觉注意力的时间显着图

获取原文

摘要

The auditory system is flooded with information throughout our daily lives. Rather than processing all of this information, we selectively shift our attention to various auditory events - either events of interest (top-down attention) or events that capture our attention exogenously (bottom-up). In this work, we are concerned with aspects of human attention that are bottom-up stimulus-driven. Saliency of an auditory event is measured by how much the event differs from the surrounding sounds that precede it in time. To calculate this, we propose a novel auditory saliency map that is defined only over time. The proposed model is contrasted against previously published auditory saliency maps which treat the two-dimensional auditory time-frequency spectrogram as an image that can be analyzed using visual saliency models. Instead, our proposed model capitalizes on the rich high-dimensional feature space that defines auditory events; where each acoustic dimension is processed across multiple scales. These normalized feature maps are then combined over time into a single temporal saliency map. The peaks of the temporal saliency map indicate the locations of the salient events in the auditory scene. We validate the accuracy of the proposed model in simulated test scenarios of simple and complex sound clips. By exploiting the unique aspects of auditory processing that cannot be readily captured by visual processes, we are able to outperform other auditory saliency models; all while highlighting the commonalities and differences between the two modalities in processing salient events in everyday scenes.
机译:在我们的日常生活中,听觉系统充斥着各种信息。而不是处理所有这些信息,我们选择性地将注意力转移到各种听觉事件上-感兴趣的事件(自上而下的注意力)或外源性地捕获我们的注意力的事件(自下而上)。在这项工作中,我们关注的是自下而上的刺激驱动的人类注意力方面。听觉事件的显着性通过事件与时间之前的周围声音有多大差异来衡量。为了计算这一点,我们提出了一种新颖的听觉显着图,该图仅随时间而定义。所提出的模型与先前发布的听觉显着图进行了对比,听觉显着图将二维听觉时频频谱图视为可以使用视觉显着模型进行分析的图像。相反,我们提出的模型利用了定义听觉事件的丰富高维特征空间;在每个尺度上处理多个尺度的声音。然后将这些归一化特征图随时间组合为单个时间显着图。时间显着性图的峰值指示了突出事件在听觉场景中的位置。我们在简单和复杂的声音片段的模拟测试场景中验证了所提出模型的准确性。通过利用视觉过程无法轻易捕获的听觉处理的独特方面,我们能够胜过其他听觉显着性模型;所有这些都强调了在处理日常场景中的显着事件时两种方式之间的共性和差异。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号