General-purpose robots of the future will need to robustly perceive and understand the local environment in order to attain various goals. Currently, computer vision research often focusses narrowly on particular tasks, such as object detection and recognition, semantic segmentation, caption generation, or pose estimation. However, any particular task put to a general-purpose robot could require different information to be gleaned from a visual sensor, necessitating a general-purpose vision system. We can envision such a system as a generic visual information extraction system, which is able to process an image and produce a representation of its content. This representation should be sufficient for completing a wide array of potential tasks. The specific, purely visual tasks listed above could be solved by selectively extracting only the task-relevant information (e.g. the class of an object or semantic label of a particular pixel).
展开▼