Describes an interface for multi modal human robot interaction, which enables people to introduce a newcomer robot to different attributes of objects and places in the room through speech commands and hand gestures. The robot makes an environment map of the room based on knowledge learned through communication with human and uses this map for navigation. The developed system consists of several sections including: natural language processing, posture recognition, object localization and map generation. This system uses a combination of multiple sources of information and model matching to detect and track a human hand so that the user can point toward an object of interest and guide the robot to go near to it or locate that object's position in the room. The position of objects in the room is located by a monocular camera vision and depth from focus method.
展开▼