首页> 外文学位 >Recognition of visual object classes.
【24h】

Recognition of visual object classes.

机译:视觉对象类别的识别。

获取原文
获取原文并翻译 | 示例

摘要

Humans can look at a scene or a photograph and easily recognize objects. Outside my window I can see cars, people walking a dog on a brick pathway, trees, buildings, etc. This perception is so effortless that it belies the difficulty of the task. Visual perception begins with the light that is reflected from the scene into the eye. The light impinges upon the retina and is transduced by a two-dimensional array of photoreceptors into noisy electrical signals. The brain must then accomplish the difficult task of transforming from this low-level representation to a higher-level understanding of the scene in terms of regions, surfaces, textures, and objects.; For computer vision the problem is the same, but the hardware is different. A camera approximates the function of the eye and retina; that is, the camera produces a two-dimensional array of numbers (pixel values) representing the intensity of light reflected from the scene. The fundamental question addressed in this thesis is the following: what mathematical processing should be applied to the pixel values in order for a computer to recognize objects? The methods we propose are not intended as a model of human brain function, although they may provide some insight. We are simply trying to solve the same visual recognition problems as the brain without concern for whether (or how) our algorithms could be realized in neuronal "hardware."; We have developed a new framework for recognizing visual object classes in which the class members consist of characteristic parts in a deformable spatial configuration. Human faces are an object class of this type, since faces consist of eyes, nose, and mouth arranged in a configuration that varies depending on expression and pose and also from one person to another. A second object class is cursive handwriting, which consists of loops, cusps, crossings, etc. arranged in a deformable pattern. In our approach, the allowed object deformations are represented through shape statistics, which are learned from examples. Instances of an object in an image are detected by finding the appropriate features in the correct spatial configuration. Our algorithm is robust with respect to partial occlusion, detector false alarms, and missed features.; Potential applications include intelligent tools for finding objects in image databases, human-machine interfaces, user authentication, intelligent data gathering and compression, signature verification, and keyword spotting. Experimental results will be presented for two problems: (1) locating quasi-frontal views of human faces in cluttered scenes and with occlusions and (2) spotting keywords in on-line cursive handwriting data.
机译:人类可以看着场景或照片并轻松识别物体。在我的窗外,我可以看到汽车,人们在砖砌的小径上dog狗,树木,建筑物等。这种感知非常轻松,以至于掩盖了这项任务的难度。视觉感知始于从场景反射到眼睛的光。光入射到视网膜上,并通过二维的感光体阵列转换为嘈杂的电信号。然后,大脑必须完成艰巨的任务,即从区域,表面,纹理和对象的角度,从低级表示转换为对场景的高级理解。对于计算机视觉,问题是相同的,但是硬件不同。摄像头可以估算眼睛和视网膜的功能;即,照相机产生代表从场景反射的光强度的数字(像素值)的二维阵列。本论文解决的基本问题如下:为了使计算机识别物体,应该对像素值进行什么数学处理?尽管我们提供的方法可能提供一些见识,但它们并不旨在作为人脑功能的模型。我们只是试图解决与大脑相同的视觉识别问题,而不用担心我们的算法是否(或如何)可以在神经元“硬件”中实现。我们已经开发了一种用于识别视觉对象类的新框架,其中,类成员由可变形空间配置中的特征部分组成。人脸是这种类型的对象类别,因为人脸由眼睛,鼻子和嘴巴组成,它们的排列根据表情和姿势以及从一个人到另一个人而变化。第二类对象是草书手写体,它由以可变形模式排列的循环,尖头,交叉等组成。在我们的方法中,允许的对象变形通过形状统计来表示,可以从示例中学到。通过在正确的空间配置中找到适当的特征,可以检测图像中对象的实例。我们的算法在部分遮挡,检测器错误警报和遗漏特征方面具有鲁棒性。潜在的应用程序包括用于在图像数据库中查找对象的智能工具,人机界面,用户身份验证,智能数据收集和压缩,签名验证以及关键字识别。将针对两个问题提出实验结果:(1)在杂乱的场景中以及遮挡下定位人脸的准正面视图;(2)在在线草书手写数据中发现关键字。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号