We propose a visual scene interpretation system for cognitive robots to maintain a consistent world model about their environments. This interpretation system is for our lifelong experimental learning framework that allows robots analyze failure contexts to ensure robustness in their future tasks. To efficiently analyze failure contexts, scenes should be interpreted appropriately. In our system, LINE-MOD and HS histograms are used to recognize objects with/without textures. Moreover, depth-based segmentation is applied for identifying unknown objects in the scene. This information is also used to augment the recognition performance. The world model includes not only the objects detected in the environment but also their spatial relations to efficiently represent contexts. Extracting unary and binary relations such as on, on ground, clear and near is useful for symbolic representation of the scenes. We test the performance of our system on recognizing objects, determining spatial predicates, and maintaining consistency of the world model of the robot in the real world. Our preliminary results reveal that our system can be successfully used to extract spatial relations in a scene and to create a consistent model of the world by using the information gathered from the onboard RGB-D sensor as the robot explores its environment.
展开▼