This thesis is arranged in two main parts. Each part relies an approach using the methods of psychophysics and computational modeling to bring abstract or high-level theories of vision closer to a concrete neurobiological foundation.ududThe first part addresses the topic of visual object categorization. Previous studies using high-level models categorization have left unresolved issues of neurobiological relevance, including how features are extracted from the image and the role played by memory capacity in categorization performance. We compared the ability of a comprehensive set of models to match the categorization performance of human observers while explicitly accounting for the models' numbers of free parameters. The most successful models did not require a large memory capacity, suggesting that a sparse, abstracted representation of category properties may underlie categorization performance. This type of representation--different from classical prototype abstraction--could also be extracted directly from two-dimensional images via a biologically plausible early vision model, rather than relying on experimenter-imposed features.ududThe second part addresses visual attention in its bottom-up, stimulus-driven form. Previous research showed that a model of bottom-up visual attention can account in part for the spatial positions of locations fixated by humans while free-viewing complex natural and artificial scenes. We used a similar framework to quantify how the predictive ability of such a model may be enhanced by new model components based on several specific mechanisms within the functional architecture of the visual system. These components included richer interactions among orientation-tuned units, both at short-range (for clutter reduction) and at long-range (for contour facilitation). Subjects free-viewed naturalistic and artificial images while their eye movements were recorded. The resulting fixation locations were compared with the models' predicted salience maps. We found that each new model component was important in attaining a strong quantitative correspondence between model and behavior. Finally, we compared the model predictions with the spatial locations obtained from a task that relied on mouse clicking rather than eye tracking. As these models become more accurate in predicting behaviorally-relevant salient locations, they become useful to a range of applications in computer vision and human-machine interface design.ud
展开▼
机译:本论文分为两个主要部分。每个部分都依靠一种使用心理物理学和计算建模方法的方法来使抽象或高级视觉理论更接近具体的神经生物学基础。 ud ud第一部分讨论视觉对象分类的主题。以前使用高级模型分类进行的研究尚未解决神经生物学相关性的问题,包括如何从图像中提取特征以及内存容量在分类性能中的作用。我们比较了一组综合模型匹配人类观察者分类性能的能力,同时明确考虑了模型的自由参数数量。最成功的模型不需要大的存储容量,这表明分类属性的稀疏抽象表示可能是分类性能的基础。这种表示形式(不同于经典的原型抽象)也可以通过生物学上可行的早期视觉模型直接从二维图像中提取,而不是依靠实验者施加的特征。 ud ud自下而上,刺激驱动的形式。先前的研究表明,自下而上的视觉注意力模型可以部分解释人类固定的位置的空间位置,同时自由查看复杂的自然和人工场景。我们使用了类似的框架来量化基于视觉系统功能架构内的几种特定机制的新模型组件如何增强这种模型的预测能力。这些组成部分包括方向调整的单元之间的更丰富的交互作用,无论是在短距离(用于减少杂波)还是在远程(用于促进轮廓)。在记录他们的眼动时,对象可以自由观看自然和人工图像。将得到的固定位置与模型的预测显着性图进行比较。我们发现,每个新的模型组件对于在模型和行为之间获得强大的定量对应关系都很重要。最后,我们将模型预测与通过鼠标单击而不是眼睛跟踪的任务获得的空间位置进行了比较。随着这些模型在预测与行为相关的显着位置时变得更加准确,它们对于计算机视觉和人机界面设计中的一系列应用变得很有用。
展开▼