首页> 外文学位 >Adaptive learning from data flow and imbalanced data.
【24h】

Adaptive learning from data flow and imbalanced data.

机译:从数据流和不平衡数据中进行自适应学习。

获取原文
获取原文并翻译 | 示例

摘要

This dissertation presents studies on adaptive machine learning algorithms deliberately designed to address the issues of learning from data flow and data set of imbalanced class distribution. It is of particular importance to the machine learning community due to the prevalence of these problems in many fields of modern business world, such as medical diagnosis, online fraud detection, etc.;Unlike conventional machine learning situations, data flow becomes continuously available over time. Under such scenario, the challenge is how to transform the vast amount of stream raw data into information and knowledge representations, and accumulate the experience over time to support future decision-making processes. In light of this, this dissertation introduces an ADAptive INcremental learning (ADAIN) framework. Starting with description of system level architecture, this research explores the design of mapping function for knowledge transfer, theoretically studies the error bound, and concludes the discussion by presenting simulation results over a video clip as well as other real-world data sets.;Traditional machine learning techniques generally fail to achieve satisfying performance for imbalanced data due to the assumption of balanced class distribution. Based on the adaptive over-sampling technique and boosting algorithm, this dissertation presents the study of RAnked Minority Over-Sampling in Boosting (RAMOBoost) algorithm to alleviate the difficulty brought by such a problem. Integrated with AdaBoost learning framework, RAMOBoost adaptively ranks minority class examples by proportioning their chance of being sampled for creation of synthetic instance to the ratio of majority class cases in their k-nearest neighbors at each iteration. In this way, the decision boundary could be progressively shifted towards difficult-to-learn minority class and majority class examples simultaneously. The dissertation covers algorithmic description of RAMOBoost and extensive simulations for validation of its competitiveness over existing methods.;This dissertation also explores a Recursive Ensemble Approach (REA) for learning from nonstationary data flow with imbalanced class distribution. REA pushes a selected part of previous minority class examples into the current data chunk to balance its class distribution, upon which a classifier is built and added into an ensemble for future prediction. Theoretical and empirical studies are both implemented to compare REA with other methods.
机译:本论文针对自适应机器学习算法进行了研究,旨在解决从不平衡类分布的数据流和数据集学习问题。由于在现代商业世界的许多领域中普遍存在这些问题,例如医学诊断,在线欺诈检测等,因此对于机器学习社区特别重要;与传统的机器学习情况不同,数据流随着时间的流逝而变得持续可用。在这种情况下,面临的挑战是如何将大量流原始数据转换为信息和知识表示形式,以及随着时间的推移积累经验以支持未来的决策过程。有鉴于此,本文介绍了一种自适应增量学习框架。从系统级体系结构的描述开始,本研究探索了知识转移的映射函数的设计,从理论上研究了误差范围,并通过在视频剪辑以及其他实际数据集上显示仿真结果来结束了讨论。机器学习技术通常由于平衡类分布的假设而无法获得令人满意的不平衡数据性能。本文基于自适应过采样技术和Boosting算法,对RAnked少数族裔Boosting过采样算法(RAMOBoost)进行了研究,以减轻此类问题带来的困难。与AdaBoost学习框架集成后,RAMOBoost通过将少数样本实例创建合成实例的机会与每次迭代中其k个近邻中多数样本实例的比率成比例,来对少数样本实例进行自适应排名。通过这种方式,决策边界可以同时逐步转向难以学习的少数群体和多数群体的例子。论文涵盖了RAMOBoost的算法描述,并通过大量的仿真验证了其在现有方法上的竞争力。本文还探索了一种递归集合方法(REA),用于从不平稳的数据流中学习不均衡的类分布。 REA将先前少数类示例的选定部分推入当前数据块中,以平衡其类分布,在此基础上构建分类器,并将其添加到集合中以供将来预测。进行理论和实证研究都是为了将​​REA与其他方法进行比较。

著录项

  • 作者

    Chen, Sheng.;

  • 作者单位

    Stevens Institute of Technology.;

  • 授予单位 Stevens Institute of Technology.;
  • 学科 Engineering Computer.
  • 学位 Ph.D.
  • 年度 2011
  • 页码 162 p.
  • 总页数 162
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号