首页> 外文OA文献 >The impact of training data characteristics on ensemble classification of land cover
【2h】

The impact of training data characteristics on ensemble classification of land cover

机译:培训数据特征对土地覆盖集合分类的影响

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Supervised classification of remote sensing imagery has long been recognised as an essential technology for large area land cover mapping. Remote sensing derived land cover and forest classification maps are important sources of information for understanding environmental processes and informing natural resource management decision making. In recent years, the supervised transformation of remote sensing data into thematic products has been advanced through the introduction and development of machine learning classification techniques. Applied to a variety of science and engineering problems over the past twenty years (Lary et al., 2016), machine learning provides greater accuracy and efficiency than traditional parametric classifiers, capable of dealing with large data volumes across complex measurement spaces. The Random forest (RF) classifier in particular, has become popular in the remote sensing community, with a range of commonly cited advantages, including its low parameterisation requirements, excellent classification results and ability to handle noisy observation data and outliers, in a complex measurement space and small training data relative to the study area size. In the context of large area land cover classification for forest cover, using multisource remote sensing and geospatial data, this research sets out to examine proposed advantages of the RF classifier - insensitivity to training data noise (mislabelling) and handling training data class imbalance. Through margin theory, the research also investigates the utility of ensemble learning – in which multiple base classifiers are combined to reduce generalisation error in classification – as a means of designing more efficient classifiers, improving classification performance, and reducing reference (training and test) data redundancy. The first part of the thesis (chapters 2 and 3) introduces the experimental setting and data used in the research, including a description (in chapter 2) of the sampling framework for the reference data used in classification experiments that follow. Chapter 3 evaluates the performance of the RF classifier applied across 7.2 million hectares of public land study area in Victoria, Australia. This chapter describes an open-source framework for deploying the RF classifier over large areas and processing significant volumes of multi-source remote sensing and ancillary spatial data. The second part of this thesis (research chapters 4 through 6) examines the effect of training data characteristics (class imbalance and mislabelling) on the performance of RF, and explores the application of the ensemble margin, as a means of both examining RF classification performance, and informing training data sampling to improve classification accuracy. Results of binary and multiclass experiments described in chapter 4, provide insights into the behaviour of RF, in which training data are not evenly distributed among classes and contain systematically mislabelled instances. Results show that while the error rate of the RF classifier is relatively insensitive to mislabelled training data (in the multiclass experiment, overall 78.3% Kappa with no mislabelled instances to 70.1% with 25% mislabelling in each class), the level of associated confidence falls at a faster rate than overall accuracy with increasing rates of mislabelled training data. This study section also demonstrates that imbalanced training data can be introduced to reduce error in classes that are most difficult to classify. The relationship between per-class and overall classification performance and the diversity of members in a RF ensemble classifier, is explored through experiments presented in chapter 5. This research examines ways of targeting particular training data samples to induce RF ensemble diversity and improve per-class and overall classification performance and efficiency. Through use of the ensemble margin, this study offers insights into the trade-off between ensemble classification accuracy and diversity. The research shows that boosting diversity among RF ensemble members, by emphasising the contribution of lower margin training instances used in the learning process, is an effective means of improving classification performance, particularly for more difficult or rarer classes, and is a way of reducing information redundancy and improving the efficiency of classification problems. Research chapter 6 looks at the application of the RF classifier for calculating Landscape Pattern Indices (LPIs) from classification prediction maps, and examines the sensitivity of these indices to training data characteristics and sampling based on the ensemble margin. This research reveals a range of commonly used LPIs to have significant sensitivity to training data mislabelling in RF classification, as well as margin-based training data sampling. In conclusion, this thesis examines proposed advantages of the popular machine learning classifier, Random forests - the relative insensitivity to training data noise (mislabelling) and its ability to handle class imbalance. This research also explores the utility of the ensemble margin for designing more efficient classifiers, measuring and improving classification performance, and designing ensemble classification systems which use reference data more efficiently and effectively, with less data redundancy. These findings have practical applications and implications for large area land cover classification, for which the generation of high quality reference data is often a time consuming, subjective and expensive exercise.
机译:遥感图像的监督分类早已被认为是大面积土地覆盖图测绘的一项必不可少的技术。遥感得出的土地覆盖图和森林分类图是了解环境过程并为自然资源管理决策提供依据的重要信息来源。近年来,通过引入和发展机器学习分类技术,已经推动了将遥感数据转换为主题产品的监督转换。过去二十年来,机器学习应用于各种科学和工程问题(Lary等人,2016),与传统的参数分类器相比,机器学习提供了更高的准确性和效率,能够处理复杂测量空间中的大量数据。尤其是随机森林(RF)分类器已在遥感领域中广受欢迎,具有一系列通常被提及的优点,包括其参数化要求低,分类结果出色以及在复杂测量中能够处理嘈杂的观测数据和异常值相对于研究区域大小的空间和小的培训数据。在使用多源遥感和地理空间数据对森林覆盖进行大面积土地覆盖分类的背景下,本研究着手研究RF分类器的建议优势-对训练数据噪声不敏感(贴错标签)和处理训练数据分类失衡。通过边际理论,该研究还研究了集成学习的效用。其中多个基本分类器组合在一起以减少分类中的泛化错误-作为设计更有效的分类器,改善分类性能并减少参考(训练和测试)数据冗余的一种手段。论文的第一部分(第2章和第3章)介绍了该研究中使用的实验环境和数据,包括(在第2章中)对随后用于分类实验的参考数据的采样框架的描述。第3章评估了适用于澳大利亚维多利亚州720万公顷公共土地研究区域的RF分类器的性能。本章介绍了一个开放源代码框架,该框架用于在大面积上部署RF分类器并处理大量的多源遥感和辅助空间数据。本文的第二部分(研究第4至第6章)探讨了训练数据特征(类不平衡和标签错误)对RF性能的影响,并探索了集成余量的应用,作为检验RF分类性能的一种手段,并通知训练数据抽样以提高分类准确性。第4章中描述的二元和多类实验的结果提供了对RF行为的见解,其中训练数据未在类之间均匀分布,并且包含系统错误标记的实例。结果表明,尽管RF分类器的错误率对标签错误的训练数据相对不敏感(在多类实验中,总的78.3%Kappa(无错误标签的实例)到70.1%(每类错误标签的比例为25%)),但相关置信度下降了错误标记的训练数据的发生率与整体准确性相比有所提高。本研究部分还表明,可以引入不平衡的训练数据来减少最难分类的课程中的错误。通过第5章介绍的实验探索了每个类别和整体分类性能与成员多样性之间的关系。本研究探讨了针对特定训练数据样本以诱导RF集合多样性并改善每个类别的方法。以及整体分类性能和效率。通过使用集成边缘,本研究提供了对集成分类准确性和多样性之间的权衡的见解。研究表明,通过强调学习过程中使用的较低边际训练实例的贡献来增强RF合奏成员之间的多样性,是提高分类性能的有效手段,尤其是对于难度较大或稀少的课程,并且是减少信息的一种方式冗余和提高分类问题的效率。第6章研究了RF分类器在分类预测图中计算景观格局指数(LPI)的应用,并研究了这些指数对训练数据特征和采样的敏感度(基于集合余量)。这项研究揭示了一系列常用的LPI对RF分类中的训练数据标签错误以及基于边距的训练数据采样具有显着的敏感性。结论,本文研究了流行的机器学习分类器(随机森林)的拟议优势-对训练数据噪声(标签错误)的相对不敏感及其处理类不平衡的能力。这项研究还探索了集合余量在设计更有效的分类器,测量和改善分类性能以及设计更有效,更有效地使用参考数据,减少数据冗余的集合分类系统中的实用性。这些发现对于大面积土地覆盖分类具有实际应用和意义,为此,高质量参考数据的生成通常是一项耗时,主观且昂贵的工作。

著录项

  • 作者

    Mellor A;

  • 作者单位
  • 年度 2017
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号