When interaction concerns the physical world, interfaces should do their best to search for information using direct observation. Image-based interfaces have been tried in the past, but generally required artificial barcode tags to be affixed to each viewed object or surface. Recent advances in computer vision and content-based image retrieval have enabled fast and robust indexing from images of individual objects--CD covers, book jackets, magazine advertisements, etc.--even on relatively low-power platforms such as camera-equipped mobile phones. I'll review the relevant algorithms and design of such systems, and discuss what types of image recognition interfaces are feasible in the near term. I'll describe very recent work on multimodal question answering interfaces, which combine image and text query matching with human-in-the-loop interaction. I'll close with a discussion of the anticipated future progress on category-level visual recognition, and what classes of interfaces it may enable.
展开▼