首页> 外文学位 >Large scale scene matching for graphics and vision.
【24h】

Large scale scene matching for graphics and vision.

机译:用于图形和视觉的大规模场景匹配。

获取原文
获取原文并翻译 | 示例

摘要

Our visual experience is extraordinarily varied and complex. The diversity of the visual world makes it difficult for computer vision to understand images and for computer graphics to synthesize visual content. But for all its richness, it turns out that the space of "scenes" might not be astronomically large. With access to imagery on an Internet scale, regularities start to emerge---for most images, there exist numerous examples of semantically and structurally similar scenes. Is it possible to sample the space of scenes so densely that one can use similar scenes to "brute force" otherwise difficult image understanding and manipulation tasks? This thesis is focused on exploiting and refining large scale scene matching to short circuit the typical computer vision and graphics pipelines for image understanding and manipulation.;First, in "Scene Completion" we patch up holes in images by copying content from matching scenes. We find scenes so similar that the manipulations are undetectable to naive viewers and we quantify our success rate with a perceptual study. Second, in "im2gps" we estimate geographic properties and global geolocation for photos using scene matching with a database of 6 million geo-tagged Internet images. We introduce a range of features for scene matching and use them, together with lazy SVM learning, to dramatically improve scene matching---doubling the performance of single image geolocation over our baseline method. Third, we study human photo geolocation to gain insights into the geolocation problem, our algorithms, and human scene understanding. This study shows that our algorithms significantly exceed human geolocation performance. Finally, we use our geography estimates, as well as Internet text annotations, to provide context for deeper image understanding, such as object detection.
机译:我们的视觉体验异常丰富而复杂。视觉世界的多样性使计算机视觉很难理解图像,而计算机图形则很难合成视觉内容。但是事实证明,尽管“场景”非常丰富,但其空间可能并不是天文数字。随着互联网规模图像的访问,规律性开始出现-对于大多数图像,存在许多语义和结构上相似的场景示例。是否可以如此密集地采样场景的空间,以至于可以使用相似的场景来“强行施加”原本难以理解的图像和进行操纵的任务?本文的重点是开发和完善大型场景匹配,以缩短典型的计算机视觉和图形管道来进行图像的理解和操作。首先,在“场景完成”中,我们通过从匹配场景中复制内容来修补图像中的漏洞。我们发现场景如此相似,以至于天真的观众无法察觉到这些操作,并且我们通过感知研究来量化成功率。其次,在“ im2gps”中,我们使用场景匹配和600万个带有地理标签的Internet图像的数据库来估计照片的地理属性和全球地理位置。我们介绍了一系列用于场景匹配的功能,并将它们与惰性SVM学习一起使用,以显着改善场景匹配-在我们的基线方法上,单幅图像地理位置的性能提高了一倍。第三,我们研究人类照片的地理位置,以深入了解地理位置问题,我们的算法和人类场景理解。这项研究表明,我们的算法大大超出了人类地理位置的性能。最后,我们使用地理估计以及Internet文本注释来提供用于更深入的图像理解的上下文,例如对象检测。

著录项

  • 作者

    Hays, James.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 148 p.
  • 总页数 148
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号