首页> 外文期刊>Journal of vision >How is visual search guided by shape? Using features from deep learning to understand preattentive shape space
【24h】

How is visual search guided by shape? Using features from deep learning to understand preattentive shape space

机译:形状如何引导视觉搜索?使用深度学习中的特征来了解专注的形状空间

获取原文
           

摘要

Visual search can be guided by target shape, but our understanding of how shape guides search has been limited a set of specific shape features such as curvature, closure, line termination, aspect ratio, or intersection type (see Wolfe & Horowitz, 2004, for a review). These features, while important, do not capture the full range of preattentive shape processing. Understanding how shape guides search more generally requires a model of preattentive "shape space" that correctly represents the similarity between a target shape and different types of distractors. Here, we investigate whether the features learned by "deep learning" convolutional neural networks (CNNs) can be used as a proxy for this shape space. Previous work has shown that the visual representations learned by these networks generalize surprisingly well to a range of visual tasks (Razavian, Azizpour, Sullivan, & Carlsson, 2014). Eight participants performed a visual search task where they searched for a randomly-rotated shape target (a butterfly or rabbit silhouette) among different types of randomly-rotated distractor shapes generated from a family of radial frequency patterns. To characterize the distractor shapes, we ran them through a CNN (Krizhevsky, Sutskever, & Hinton, 2012) and used the feature vector produced by the second-to-last layer of the network as candidate shape features. Easy and hard distractors for each target were well-separated in this shape space, and hard distractors tended to be closer to the target in the neural network's representation of shape. Different participants tended to converge to a similar part of the feature space for hard distractors, but there was less agreement on which distractors were easiest. Our results suggest that the visual representation learned by a "deep learning" CNN is a reasonable approximation of the perceptual space in which humans process shape.
机译:视觉搜索可以由目标形状引导,但是我们对形状引导搜索的理解受到了一组特定形状特征的限制,例如曲率,闭合,线终止,长宽比或相交类型(请参阅Wolfe&Horowitz,2004年,回顾)。这些功能虽然很重要,但并不能完全涵盖所有注意形状的处理。要更全面地了解形状引导搜索的方式,就需要一个专注的“形状空间”模型,该模型可以正确表示目标形状与不同类型的干扰物之间的相似性。在这里,我们调查通过“深度学习”卷积神经网络(CNN)学习的特征是否可以用作此形状空间的代理。先前的工作表明,这些网络学习到的视觉表示法可以很好地推广到一系列视觉任务(Razavian,Azizpour,Sullivan和Carlsson,2014年)。八名参与者执行了视觉搜索任务,他们在从一系列径向频率模式生成的不同类型的随机旋转撑开器形状中搜索了随机旋转形状的目标(蝴蝶或兔子的轮廓)。为了描述干扰物的形状,我们将它们遍历了CNN(Krizhevsky,Sutskever和Hinton,2012),并将网络的倒数第二层产生的特征向量用作候选形状特征。在此形状空间中,每个目标的易分散干扰物和硬分散干扰物都很好地分开了,而在神经网络的形状表示中,硬分散干扰物往往更靠近目标。对于硬撑开器,不同的参与者倾向于收敛到特征空间的相似部分,但是关于哪种撑开器最容易的共识却很少。我们的结果表明,通过“深度学习” CNN学习的视觉表示形式是人类在其中加工形状的感知空间的合理近似。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号