...
首页> 外文期刊>Applied Sciences >Impacts of Sample Design for Validation Data on the Accuracy of Feedforward Neural Network Classification
【24h】

Impacts of Sample Design for Validation Data on the Accuracy of Feedforward Neural Network Classification

机译:验证数据的样本设计对前馈神经网络分类精度的影响

获取原文
           

摘要

Validation data are often used to evaluate the performance of a trained neural network and used in the selection of a network deemed optimal for the task at-hand. Optimality is commonly assessed with a measure, such as overall classification accuracy. The latter is often calculated directly from a confusion matrix showing the counts of cases in the validation set with particular labelling properties. The sample design used to form the validation set can, however, influence the estimated magnitude of the accuracy. Commonly, the validation set is formed with a stratified sample to give balanced classes, but also via random sampling, which reflects class abundance. It is suggested that if the ultimate aim is to accurately classify a dataset in which the classes do vary in abundance, a validation set formed via random, rather than stratified, sampling is preferred. This is illustrated with the classification of simulated and remotely-sensed datasets. With both datasets, statistically significant differences in the accuracy with which the data could be classified arose from the use of validation sets formed via random and stratified sampling ( z = 2.7 and 1.9 for the simulated and real datasets respectively, for both p < 0.05%). The accuracy of the classifications that used a stratified sample in validation were smaller, a result of cases of an abundant class being commissioned into a rarer class. Simple means to address the issue are suggested.
机译:验证数据通常用于评估经过训练的神经网络的性能,并用于选择适合于当前任务的最佳网络。通常通过诸如整体分类准确性之类的量度来评估最优性。后者通常直接从混淆矩阵中计算得出,该矩阵显示了具有特定标记属性的验证集中的病例数。但是,用于形成验证集的样本设计可能会影响准确性的估计值。通常,验证集是由分层样本构成的,以提供均衡的类别,但也可以通过反映类别数量的随机抽样来形成。建议如果最终目的是对类别确实存在大量变化的数据集进行准确分类,则首选通过随机而非分层抽样形成的验证集。模拟和遥感数据集的分类说明了这一点。对于这两个数据集,由于使用了通过随机抽样和分层抽样形成的验证集(对于模拟数据集和实际数据集分别为z = 2.7和1.9,对于p均<0.05%),因此可以对数据进行分类的准确性存在统计学上的显着差异)。在验证中使用分层样本的分类的准确性较小,这是由于将丰富类别的案例委托给较稀有类别的结果。建议解决此问题的简单方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号