首页> 外文会议> >Cost-Sensitive Perceptron Decision Trees for Imbalanced Drifting Data Streams
【24h】

Cost-Sensitive Perceptron Decision Trees for Imbalanced Drifting Data Streams

机译:成本敏感的漂移数据流感知器决策树

获取原文
获取外文期刊封面目录资料

摘要

Mining streaming and drifting data is among the most popular contemporary applications of machine learning methods. Due to the potentially unbounded number of instances arriving rapidly, evolving concepts and limitations imposed on utilized computational resources, there is a need to develop efficient and adaptive algorithms that can handle such problems. These learning difficulties can be further augmented by appearance of skewed distributions during the stream progress. Class imbalance in non-stationary scenarios is highly challenging, as not only imbalance ratio may change over time, but also relationships among classes. In this paper we propose an efficient and fast cost-sensitive decision tree learning scheme for handling online class imbalance. In each leaf of the tree we train a perceptron with output adaptation to compensate for skewed class distributions, while McDiarmid's bound is used for controlling the splitting attribute selection. The cost matrix automatically adapts itself to the current imbalance ratio in the stream, allowing for a smooth compensation of evolving class relationships. Furthermore, we analyze characteristics of minority class instances and incorporate this information during the model update process. It allows our classifier to focus on most difficult instances, while a sliding window keeps track of changes in class structures. Experimental analysis carried out on a number of binary and multi-class imbalanced data streams indicate the usefulness of the proposed approach.
机译:挖掘流和漂移数据是当今机器学习方法中最流行的应用。由于可能迅速到达的无数实例数量,不断发展的概念和对所利用的计算资源的限制,因此需要开发能够处理此类问题的高效且自适应的算法。这些学习困难可以通过在流程进行过程中出现偏斜的分布而进一步加剧。在非固定情况下,班级失衡是极具挑战性的,因为不仅失衡率可能随时间变化,而且班级之间的关系也可能发生变化。在本文中,我们提出了一种高效,快速,成本敏感的决策树学习方案,用于处理在线班级不平衡问题。在树的每个叶子中,我们训练具有输出适应性的感知器以补偿偏斜的类分布,而McDiarmid的边界用于控制拆分属性的选择。成本矩阵会自动适应流中当前的不平衡比率,从而可以平滑地补偿不断发展的类关系。此外,我们分析了少数类实例的特征,并在模型更新过程中合并了此信息。它使我们的分类器可以专注于最困难的实例,而滑动窗口可以跟踪类结构的变化。对许多二进制和多类不平衡数据流进行的实验分析表明了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号