首页> 外文学位 >Learning Hierarchical Feature Extractors For Image Recognition.
【24h】

Learning Hierarchical Feature Extractors For Image Recognition.

机译:学习用于图像识别的分层特征提取器。

获取原文
获取原文并翻译 | 示例

摘要

Telling cow from sheep is effortless for most animals, but requires much engineering for computers. In this thesis, we seek to tease out basic principles that underlie many recent advances in image recognition. First, we recast many methods into a common unsupervised feature extraction framework based on an alternation of coding steps, which encode the input by comparing it with a collection of reference patterns, and pooling steps, which compute an aggregation statistic summarizing the codes within some region of interest of the image. Within that framework, we conduct extensive comparative evaluations of many coding or pooling operators proposed in the literature. Our results demonstrate a robust superiority of sparse coding (which decomposes an input as a linear combination of a few visual words) and max pooling (which summarizes a set of inputs by their maximum value). We also propose macrofeatures, which import into the popular spatial pyramid framework the joint encoding of nearby features commonly practiced in neural networks, and obtain significantly improved image recognition performance. Next, we analyze the statistical properties of max pooling that underlie its better performance, through a simple theoretical model of feature activation. We then present results of experiments that confirm many predictions of the model. Beyond the pooling operator itself, an important parameter is the set of pools over which the summary statistic is computed. We propose locality in feature configuration space as a natural criterion for devising better pools. Finally, we propose ways to make coding faster and more powerful through fast convolutional feedforward architectures, and examine how to incorporate supervision into feature extraction schemes. Overall, our experiments offer insights into what makes current systems work so well, and state-of-the-art results on several image recognition benchmarks.
机译:对于大多数动物而言,用羊讲牛是不费力的,但需要大量的计算机工程设计。在这篇论文中,我们试图梳理构成图像识别最新进展基础的基本原理。首先,我们将许多方法改写为基于编码步骤交替的通用无监督特征提取框架,该方法通过将输入与参考模式的集合进行比较来对输入进行编码,并合并步骤,从而计算出汇总统计量以汇总某些区域内的代码图像的兴趣。在此框架内,我们对文献中提出的许多编码或合并运算符进行了广泛的比较评估。我们的结果证明了稀疏编码(将输入分解为几个视觉单词的线性组合)和最大池化(通过其最大值总结一组输入)的强大优势。我们还提出了宏观特征,将其引入到流行的空间金字塔框架中,将通常在神经网络中进行的附近特征的联合编码导入到人的空间金字塔框架中,并获得显着改善的图像识别性能。接下来,我们通过简单的特征激活理论模型来分析最大池的统计属性,该统计池是其更好性能的基础。然后,我们介绍确认模型的许多预测的实验结果。除了合并运算符本身之外,一个重要的参数是在其上计算摘要统计信息的一组池。我们建议将要素配置空间中的局部性作为设计更好池的自然标准。最后,我们提出了通过快速卷积前馈体系结构使编码更快,功能更强大的方法,并研究了如何将监督纳入特征提取方案中。总体而言,我们的实验提供了有关使当前系统如此出色运行的见解,并提供了一些图像识别基准的最新技术成果。

著录项

  • 作者

    Boureau, Y-Lan.;

  • 作者单位

    New York University.;

  • 授予单位 New York University.;
  • 学科 Artificial Intelligence.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 195 p.
  • 总页数 195
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号