This thesis is concerned with the technical realization of a scene analysis system which is able to recognize several object categories independent of their visual appearance and to generate a semantic description of the sensed scene from the sensor's point of view. This kind of scene analysis system is a necessary prerequisite for a functional driven navigation of mobile robots in indoor environments and an intuitive man-machine-interaction. The solution of the above mentioned problem is founded on the model of human visual perception introduced by Palmer which describes the human scene analysis by four processing layers with increasing level of abstraction from the retinal image to the semantic scene description and a bidirectional processing of these layers. Each layer contains local a priori knowledge which humans apply during scene analysis. For implementation of Palmer's model a new, modular, and application independent processing concept is developed. The core of this concept is a knowledge base, which consists of a semantic net extended by local experts. The declarative a priori knowledge of all processing layers is coded in the vertices and edges of the semantic net while the local experts which are embedded in the vertices contain the procedural a priori knowledge. The latter is represented in terms of production rules which are responsible for creating local results, rating these results, and adapting the creation of results. An inference engine processes the knowledge base by transforming the semantic net into a sequence of its vertices and by executing the rules of each local expert in the order of the created sequence. As required by Palmer's model the generated sequence of vertices is processed bidirectionally. Therefore a data driven processing phase for creating results and a model driven processing phase for rating these results and adapting the creation of results is distinguished. At each point in time a blackboard architecture allows to access all intermediate results. Therefore the system is able to use already gained knowledge, especially for adaptations. The time needed for a scene analysis finally depends only on the number of adaptations triggered by the local experts. Based on this processing concept an application dependent knowledge base for an indoor scene analysis for the recognition of floors, walls, ceilings, obstacles and doors is developed which mainly takes into account knowledge about the relations between these object categories to one another and to a stereo camera which is the only sensor in this application. This knowledge is coded in the local experts of the highest processing layer according to Palmer's model by means of Fuzzy sets and Fuzzy logic. The evaluation of the indoor scene analysis is based on the correctness of the used a priori knowledge and its robustness against noisy and incomplete sensor data. Both criteria are assessed quantitatively and independently of each other. Therefore control over the sensor noise is required which is guaranteed by the use of a virtual reality environment. The comparison of the scene analysis results obtained in the VR environment to manually created references shows that under ideal conditions the used a priori knowledge allows to assign 95% of the scene image regions to the object categories correctly. This finally leads to the conclusion that Palmer's model is suitable for technical realizations of scene analysis systems.
展开▼