...
首页> 外文期刊>Remote Sensing of Environment: An Interdisciplinary Journal >On the relationship between training sample size and data dimensionality: Monte Carlo analysis of broadband multi-temporal classification
【24h】

On the relationship between training sample size and data dimensionality: Monte Carlo analysis of broadband multi-temporal classification

机译:训练样本量与数据维数之间的关系:宽带多时相分类的蒙特卡洛分析

获取原文
获取原文并翻译 | 示例
           

摘要

The number of training samples per class (n) required for accurate Maximum Likelihood (ML) classification is known to be affected by the number of bands (p) in the input image. However, the general rule which defines that n should be 10p to 30p is often enforced universally in temote sensing without questioning its relevance to the complexity of the specific discrimination problem. Furthermore, identifying this many training samples is often problematic when many classes and/or many bands are used. It is important, then, to test how this generally accepted rule matches common remote sensing discrimination problems because it could be unnecessarily restrictive for many applications. This study was primarily conducted in order to test whether the general rule defining the relationship between n and p was well-suited for ML classification of a relatively simple remote sensing-based discrimination problem. To summarise the mean response of n-to-p for our study site, a Monte Carlo procedure was used to randomly stack various numbers of bands into thousands of separate image combinations that were then classified using an ML algorithm. The bands were randomly selected from a 119-band Enhanced Thematic Mapper-plus (ETM+) dataset comprised of 17 images acquired during the 2001-2002 southern hemisphere summer agricultural growing season over an irrigation area in south-eastern Australia. Results showed that the number of training samples needed for accurate ML classification was much lower than the current widely accepted rule, Due to the asymptotic nature of the relationship, we found that 95% of the accuracy attained using n=30p samples could be achieved by using approximately 2p to 4p samples, or ≤ 1/7th the currently recommended value of n. Our findings show that the number of training samples needed for a simple discrimination problem is much less than that defined by the general rule and therefore the rule should not be universally enforced; the number of training samples needed should also be determined by considering the complexity of the discrimination problem.
机译:准确的最大似然(ML)分类所需的每类训练样本的数量(n)受到输入图像中带数(p)的影响。但是,通常在定义词条时普遍执行定义n应该为10p到30p的一般规则,而不会质疑它与特定歧视问题的复杂性之间的关系。此外,当使用许多类别和/或许多频带时,识别出这么多训练样本通常是有问题的。然后,重要的是测试该普遍接受的规则如何与常见的遥感歧视问题相匹配,因为它对于许多应用可能不必要地受到限制。进行这项研究的主要目的是测试定义n和p之间关系的一般规则是否适合于相对简单的基于遥感的识别问题的ML分类。为了总结本研究站点从n到p的平均响应,使用了蒙特卡洛程序将各种数量的谱带随机堆叠成数千个单独的图像组合,然后使用ML算法对其进行分类。这些波段是从119波段增强主题映射器(ETM +)数据集中随机选择的,该数据集由2001年至2002年南半球夏季农业生长季节在澳大利亚东南部灌溉区获得的17张图像组成。结果表明,准确的ML分类所需的训练样本数量远低于当前广泛接受的规则,由于这种关系的渐近性质,我们发现使用n = 30p样本可以获得的准确率达到95%使用大约2p到4p的样本,或者小于当前推荐值n的1/7。我们的发现表明,一个简单的歧视问题所需的训练样本数量远远少于一般规则所定义的数量,因此该规则不应被普遍实施;还应考虑歧视问题的复杂性来确定所需训练样本的数量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号