Rather than attempting to fully interpret visual scenes in audparallel fashion, biological systems appear to employ a serial strategy byudwhich an attentional spotlight rapidly selects circumscribed regions in theudscene for further analysis. The spatiotemporal deployment of attentionudhas been shown to be controlled by both bottom-up (image-based) andudtop-down (volitional) cues. We describe a detailed neuromimetic computerudimplementation of a bottom-up scheme for the control of visualudattention, focusing on the problem of combining information across modalitiesud(orientation, intensity, and color information) in a purely stimulusdrivenudmanner. We have applied this model to a wide range of targetuddetection tasks, using synthetic and natural stimuli. Performance has,udhowever, remained difficult to objectively evaluate on natural scenes,udbecause no objective reference was available for comparison. Weudpresent predicted search times for our model on the Search–2 databaseudof rural scenes containing a military vehicle. Overall, we found a poorudcorrelation between human and model search times. Further analysis,udhowever, revealed that in 75% of the images, the model appeared touddetect the target faster than humans (for comparison, we calibrated theudmodel’s arbitrary internal time frame such that 2 to 4 image locationsudwere visited per second). It seems that this model, which had originallyudbeen designed not to find small, hidden military vehicles, but rather toudfind the few most obviously conspicuous objects in an image, performedudas an efficient target detector on the Search–2 dataset. Further developmentsudof the model are finally explored, in particular through a moreudformal treatment of the difficult problem of extracting suitable low-leveludfeatures to be fed into the saliency map.
展开▼