首页> 外文OA文献 >Etude du passage à l'échelle des algorithmes de segmentation et de classification en télédétection pour le traitement de volumes massifs de données
【2h】

Etude du passage à l'échelle des algorithmes de segmentation et de classification en télédétection pour le traitement de volumes massifs de données

机译:大规模分割和分类算法在遥感中用于海量数据处理的研究

摘要

Recent Earth observation spatial missions will provide very high spectral, spatial and temporal resolution optical images, which represents a huge amount of data. The objective of this research is to propose innovative algorithms to process efficiently such massive datasets on resource-constrained devices. Developing new efficient algorithms which ensure identical results to those obtained without the memory limitation represents a challenging task. The first part of this thesis focuses on the adaptation of segmentation algorithms when the input satellite image can not be stored in the main memory. A naive solution consists of dividing the input image into tiles and segment each tile independently. The final result is built by grouping the segmented tiles together. Applying this strategy turns out to be suboptimal since it modifies the resulting segments compared to those obtained from the segmentation without tiling. A deep study of region-merging segmentation algorithms allows us to develop a tile-based scalable solution to segment images of arbitrary size while ensuring identical results to those obtained without tiling. The feasibility of the solution is shown by segmenting different very high resolution Pléiades images requiring gigabytes to be stored in the memory. The second part of the thesis focuses on supervised learning methods when the training dataset can not be stored in the memory. In the frame of the thesis, we decide to study the Random Forest algorithm which consists of building an ensemble of decision trees. Several solutions have been proposed to adapt this algorithm for processing massive training datasets, but they remain either approximative because of the limitation of memory imposes a reduced visibility of the algorithm on a small portion of the training datasets or inefficient because they need a lot of read and write access on the hard disk. To solve those issues, we propose an exact solution ensuring the visibility of the algorithm on the whole training dataset while minimizing read and write access on the hard disk. The running time is analysed by varying the dimension of the training dataset and shows that our proposed solution is very competitive with other existing solutions and can be used to process hundreds of gigabytes of data.
机译:最近的地球观测空间任务将提供非常高的光谱,空间和时间分辨率的光学图像,这代表了大量数据。这项研究的目的是提出创新的算法,以在资源受限的设备上有效处理此类海量数据集。开发新的高效算法以确保获得与不受内存限制的结果相同的结果,这是一项艰巨的任务。本文的第一部分着重于当输入的卫星图像不能存储在主存储器中时分割算法的适应性。幼稚的解决方案包括将输入图像划分为图块,然后分别分割每个图块。通过将分割后的图块分组在一起来构建最终结果。与从不带切片的分割中获得的片段相比,应用此策略的结果是次佳的,因为它修改了所得的片段。对区域合并分割算法的深入研究使我们能够开发一种基于图块的可扩展解决方案,以分割任意大小的图像,同时确保与不进行平铺而获得的结果相同。通过将需要千兆字节存储在内存中的不同分辨率非常高的Pléiades图像进行分割,表明了该解决方案的可行性。本文的第二部分着重于训练数据集不能存储在内存中的监督学习方法。在本文的框架内,我们决定研究随机森林算法,该算法包括构建决策树的集合。已经提出了几种解决方案以使该算法适合处理大量的训练数据集,但是由于内存的限制使算法在训练数据集的一小部分上的可见性降低,或者由于需要大量读取而效率低下,因此它们仍然是近似的并在硬盘上写入访问权限。为了解决这些问题,我们提出了一种精确的解决方案,可确保算法在整个训练数据集上的可见性,同时最大程度地减少对硬盘的读写访问。通过更改训练数据集的维度来分析运行时间,结果表明我们提出的解决方案与其他现有解决方案相比非常有竞争力,可以用于处理数百GB的数据。

著录项

  • 作者

    Lassalle Pierre;

  • 作者单位
  • 年度 2015
  • 总页数
  • 原文格式 PDF
  • 正文语种 fr
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号