首页> 外文期刊>Expert systems with applications >A data sampling and attribute selection strategy for improving decision tree construction
【24h】

A data sampling and attribute selection strategy for improving decision tree construction

机译:改进决策树建设的数据采样和属性选择策略

获取原文
获取原文并翻译 | 示例

摘要

Decision trees are efficient means for building classification models due to the compressibility, simplicity and ease of interpretation of their results. However, during the construction phase of decision trees, the outputs are often large trees that are affected by many uncertainties in the data (particularity, noise and residual variation). Combining attribute selection and data sampling presents one of the most promising research directions to overcome decision tree construction problems. However, the search space composed of all possible combinations of subsets of training samples and attributes is extremely large. In this paper, a novel approach is presented that allows generating an optimized decision tree by selecting an optimal couple of training samples and attributes subsets for training. As the search space of candidate couples of training samples and attributes subsets is extremely large, we use particle swarm optimization to make the search of an "optimal" solution tractable. The selected optimized solution helps in avoiding over-fitting and complexity problems suffered in the construction phase of decision trees. We conducted an extensive experimental evaluation on 22 datasets from the UCI Machine Learning Repository. The obtained results show that the proposed approach outperforms state-of-the-art classical as well as evolutionary decision tree construction methods in terms of simplicity, accuracy, and F-measure. We further evaluate our approach on a real-world engineering application for condition monitoring of rotating machinery under severe non-stationary conditions. The obtained results showed that the proposed approach allowed to optimize the use of instantaneous angular speed to diagnose gears defects. (C) 2019 Elsevier Ltd. All rights reserved.
机译:由于压缩性,简单性和易于解释它们的结果,决策树是构建分类模型的有效手段。然而,在决策树的施工阶段,输出通常是大树的大树,这些树木受到数据中许多不确定性的影响(特殊性,噪声和剩余变化)。组合属性选择和数据采样呈现出最有前途的研究方向之一,以克服决策树施工问题。但是,由训练样本和属性的所有可能组合组成的搜索空间非常大。在本文中,提出了一种新方法,其允许通过选择用于训练的最佳训练样本和属性子集来产生优化的决策树。由于培训样本和属性子集的候选夫妇的搜索空间非常大,我们使用粒子群优化来搜索“最佳”解决方案。所选优化的解决方案有助于避免决策树施工阶段遭受的过度拟合和复杂性问题。我们对来自UCI机器学习存储库的22个数据集进行了广泛的实验评估。所得结果表明,在简单,准确性和F测量方面,所提出的方法优于最先进的经典以及进化决策树施工方法。我们进一步评估了我们在严重的非静止条件下对旋转机械的现实工程监测的现实世界工程应用的方法。所得结果表明,所提出的方法允许优化使用瞬时角速度来诊断齿轮缺陷。 (c)2019 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号