Traditionally, robots may learn to perform tasks by observation in clean or sterile environments. However, robots are unable to accurately learn tasks by observation in real environments (e.g., cluttered, noisy, chaotic environments). Methods and systems are provided for teaching robots to learn tasks in real environments based on input (e.g., verbal or textual cues). In particular, a verbal-based Focus-of-Attention (FOA) model receives input, parses the input to recognize at least a task and a target object name. This information is used to spatio-temporally filter a demonstration of the task to allow the robot to focus on the target object and movements associated with the target object within a real environment. In this way, using the verbal-based FOA, a robot is able to recognize “where and when” to pay attention to the demonstration of the task, thereby enabling the robot to learn the task by observation in a real environment.
展开▼