首页> 外文期刊>Knowledge-Based Systems >New label noise injection methods for the evaluation of noise filters
【24h】

New label noise injection methods for the evaluation of noise filters

机译:用于噪声滤波器评估的新标签噪声注入方法

获取原文
获取原文并翻译 | 示例

摘要

Noise is often present in real datasets used for training Machine Learning classifiers. Their disruptive effects in the learning process may include: increasing the complexity of the induced models, a higher processing time and a reduced predictive power in the classification of new examples. Therefore, treating noisy data in a preprocessing step is crucial for improving data quality and to reduce their harmful effects in the learning process. There are various filters using different concepts for identifying noisy examples in a dataset. Their ability in noise preprocessing is usually assessed in the identification of artificial noise injected into one or more datasets. This is performed to overcome the limitation that only a domain expert can guarantee whether a real example is indeed noisy. The most frequently used label noise injection method is the noise at random method, in which a percentage of the training examples have their labels randomly exchanged. This is carried out regardless of the characteristics and example space positions of the selected examples. This paper proposes two novel methods to inject label noise in classification datasets. These methods, based on complexity measures, can produce more challenging and realistic noisy datasets by the disturbance of the labels of critical examples situated close to the decision borders and can improve the noise filtering evaluation. An extensive experimental evaluation of different noise filters is performed using public datasets with imputed label noise and the influence of the noise injection methods are compared in both data preprocessing and classification steps.
机译:噪声经常出现在用于训练机器学习分类器的真实数据集中。它们在学习过程中的破坏性影响可能包括:增加归纳模型的复杂性,增加处理时间并降低对新示例进行分类的预测能力。因此,在预处理步骤中处理嘈杂的数据对于提高数据质量并减少其在学习过程中的有害影响至关重要。各种过滤器使用不同的概念来识别数据集中的嘈杂示例。通常在识别注入一个或多个数据集中的人造噪声时评估它们在噪声预处理中的能力。这样做是为了克服只有领域专家才能保证真实示例是否确实有噪声的限制。最常用的标签噪声注入方法是随机噪声方法,其中一部分训练示例的标签被随机交换。无论所选示例的特征和示例空间位置如何,都可以执行此操作。本文提出了两种在分类数据集中注入标签噪声的新方法。这些方法基于复杂性度量,可以通过扰动靠近决策边界的关键示例的标签来生成更具挑战性和现实性的嘈杂数据集,并可以改善噪声过滤评估。使用带有推定标签噪声的公共数据集对不同的噪声滤波器进行了广泛的实验评估,并在数据预处理和分类步骤中比较了噪声注入方法的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号