We introduce a biologically inspired localization system, based on a "two-microphone and one camera" configuration. Our aim is to perform a robust, multimodal 360° detection of objects, in particular humans, in the horizontal plane. In our approach, we consider neurophysiological findings to discuss the biological plausibility of the coding and extraction of spatial features, but also meet the demands and constraints of a practical application in the field of human-robot interaction. Presently we are able to demonstrate a model of binaural sound localization using Interaural Time Differences for the left/right detection and spectrum-based features to discriminate between in front and behind. The objective of the current work is the multimodal integration of different types of vision systems. In this paper, we summarize the experiences with the design and use of the auditory model and suggest a new model concept for the audio-visual integration of spatial localization hypotheses.
展开▼