首页> 外文学位 >Depth Inference and Visual Saliency Detection from 2D Images.
【24h】

Depth Inference and Visual Saliency Detection from 2D Images.

机译:从2D图像进行深度推断和视觉显着性检测。

获取原文
获取原文并翻译 | 示例

摘要

With the rapid development of 3D vision technology, it is an active research topic to recover the depth information from 2D images. Current solutions heavily depend on the structure assumption of the 2D image and their applications are limited. It is now still technically challenging to develop an efficient yet general solution to generate the depth map from a single image. Furthermore, psychological study indicates that human eyes are particular sensitive to salient object region within one image. Thus, it is critical to detect salient object accurately, and segment its boundary very well as small depth error in these areas will lead to intolerant visual distortion. Briefly speaking, research works in this literature can be categorized into two different categories. Depth map inference system design and salient object detection and segmentation algorithm development.;For depth map inference system design, we propose a novel depth inference system for 2D images and videos. Specifically, we first adopt the in-focus region detection and salient map computation techniques to separate the foreground objects from the remaining background region. After that, a color-based grab-cut algorithm is used to remove the background from obtained foreground objects by modeling the background. As a result, the depth map of the background can be generated by a modified vanishing point detection method. Then, key frame depth maps can be propagated to the remaining frames. Finally, to meet the stringent requirements of VLSI chip implementation such as limited on-chip memory size and real-time processing, we modify some building modules with simplified versions of the in-focus region detection and the mean-shift algorithm. Experimental result shows that the proposed solution can provide accurate depth maps for 83% of images while other state-of-the-art methods can only achieve accuracy for 34% of these test images. This simplified solution targeting at the VLSI chip implementation has been validated for its high accuracy as well as high efficiency on several test video clips.;For salient object detection, inspired by success of late fusion in semantic analysis and multi-modal biometrics, we model saliency detection as late fusion at confidence score level. In fact, we proposed to fuse state-of-the-arts saliency models at score level in a para-boosting learning fashion. Firstly, saliency maps generated from these models are used as confidence scores. Then, these scores are fed into our para-boosting learner (i.e. Support Vector Machine (SVM), Adaptive Boosting (AdBoost), or Probability Density Estimator (PDE)) to predict the final saliency map. In order to explore strength of para-boosting learners, traditional transformation based fusion strategies such as Sum, Min, Max are also applied for comparison purpose. In our application scenario, salient object segmentation is our final goal. So, we further propose a novel salient object segmentation schema using Conditional Random Field (CRF) graph model. In this segmentation model, we first extract local low level features, such as output maps of several saliency models, gradient histogram and position of each image pixel. We then train a random forest classifier to fuse saliency maps into a single high level feature map using ground-truth annotations. Finally, Both low- and high-level features are fed into our CRF and parameters are learned. The segmentation results are evaluated from two different perspectives: region and contour accuracy. Extensive experimental comparison shows that both our salient object detection and segmentation model outperforms the ground truth labeled by human eyes. State-of-the-art saliency models and are, so far, the closest to human eyes' performance.
机译:随着3D视觉技术的飞速发展,从2D图像中恢复深度信息已成为研究的热点。当前的解决方案在很大程度上取决于2D图像的结构假设,其应用受到限制。现在,开发一种有效而通用的解决方案以从单个图像生成深度图在技术上仍然具有挑战性。此外,心理学研究表明,人眼对一张图像中的显着物体区域特别敏感。因此,至关重要的是准确地检测出显着物体,并很好地分割其边界,因为这些区域中的小深度误差将导致不可接受的视觉失真。简而言之,该文献中的研究工作可以分为两类。深度图推理系统设计和显着目标检测与分割算法开发。;对于深度图推理系统设计,我们提出了一种新颖的用于2D图像和视频的深度推理系统。具体来说,我们首先采用对焦区域检测和显着图计算技术将前景物体与其余背景区域分开。之后,使用基于颜色的抓剪算法通过对背景建模来从获得的前景对象中删除背景。结果,可以通过改进的消失点检测方法来生成背景的深度图。然后,关键帧深度图可以传播到其余帧。最后,为了满足VLSI芯片实施的严格要求,例如有限的片上存储器大小和实时处理,我们用聚焦区域检测和均值漂移算法的简化版本修改了一些构建模块。实验结果表明,所提出的解决方案可以为83%的图像提供准确的深度图,而其他最新方法只能为34%的这些测试图像提供准确的深度图。这种针对VLSI芯片实现的简化解决方案已经在多个测试视频剪辑上获得了高准确性和高效率的验证。;对于显着物体检测,受语义分析和多模式生物识别技术后期融合成功的启发,我们对模型进行了建模显着性检测为置信度得分级别的后期融合。实际上,我们建议以一种提升学习能力的方式在分数级别融合最新的显着性模型。首先,将从这些模型生成的显着性图用作置信度得分。然后,将这些分数输入到我们的辅助提升学习器(即支持向量机(SVM),自适应提升(AdBoost)或概率密度估计器(PDE))中,以预测最终的显着性图。为了探索助推器学习者的力量,还基于传统的基于变换的融合策略(例如Sum,Min,Max)进行比较。在我们的应用场景中,显着的对象分割是我们的最终目标。因此,我们进一步提出了使用条件随机场(CRF)图模型的新型显着对象分割方案。在此分割模型中,我们首先提取局部低层特征,例如几个显着性模型的输出图,梯度直方图和每个图像像素的位置。然后,我们训练一个随机森林分类器,以使用地面真实性注释将显着性图融合为单个高级特征图。最后,将低级和高级功能都输入到我们的CRF中并学习参数。从两个不同的角度评估分割结果:区域和轮廓精度。广泛的实验比较表明,我们的显着目标检测和分割模型均优于人眼标记的地面真实情况。迄今为止,最先进的显着性模型最接近人眼的表现。

著录项

  • 作者

    Wang, Jingwei.;

  • 作者单位

    University of Southern California.;

  • 授予单位 University of Southern California.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 129 p.
  • 总页数 129
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

  • 入库时间 2022-08-17 11:40:53

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号