Adaptive learning from data flow and imbalanced data.

机译：从数据流和不平衡数据中进行自适应学习。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This dissertation presents studies on adaptive machine learning algorithms deliberately designed to address the issues of learning from data flow and data set of imbalanced class distribution. It is of particular importance to the machine learning community due to the prevalence of these problems in many fields of modern business world, such as medical diagnosis, online fraud detection, etc.;Unlike conventional machine learning situations, data flow becomes continuously available over time. Under such scenario, the challenge is how to transform the vast amount of stream raw data into information and knowledge representations, and accumulate the experience over time to support future decision-making processes. In light of this, this dissertation introduces an ADAptive INcremental learning (ADAIN) framework. Starting with description of system level architecture, this research explores the design of mapping function for knowledge transfer, theoretically studies the error bound, and concludes the discussion by presenting simulation results over a video clip as well as other real-world data sets.;Traditional machine learning techniques generally fail to achieve satisfying performance for imbalanced data due to the assumption of balanced class distribution. Based on the adaptive over-sampling technique and boosting algorithm, this dissertation presents the study of RAnked Minority Over-Sampling in Boosting (RAMOBoost) algorithm to alleviate the difficulty brought by such a problem. Integrated with AdaBoost learning framework, RAMOBoost adaptively ranks minority class examples by proportioning their chance of being sampled for creation of synthetic instance to the ratio of majority class cases in their k-nearest neighbors at each iteration. In this way, the decision boundary could be progressively shifted towards difficult-to-learn minority class and majority class examples simultaneously. The dissertation covers algorithmic description of RAMOBoost and extensive simulations for validation of its competitiveness over existing methods.;This dissertation also explores a Recursive Ensemble Approach (REA) for learning from nonstationary data flow with imbalanced class distribution. REA pushes a selected part of previous minority class examples into the current data chunk to balance its class distribution, upon which a classifier is built and added into an ensemble for future prediction. Theoretical and empirical studies are both implemented to compare REA with other methods.

机译：本论文针对自适应机器学习算法进行了研究，旨在解决从不平衡类分布的数据流和数据集学习问题。由于在现代商业世界的许多领域中普遍存在这些问题，例如医学诊断，在线欺诈检测等，因此对于机器学习社区特别重要;与传统的机器学习情况不同，数据流随着时间的流逝而变得持续可用。在这种情况下，面临的挑战是如何将大量流原始数据转换为信息和知识表示形式，以及随着时间的推移积累经验以支持未来的决策过程。有鉴于此，本文介绍了一种自适应增量学习框架。从系统级体系结构的描述开始，本研究探索了知识转移的映射函数的设计，从理论上研究了误差范围，并通过在视频剪辑以及其他实际数据集上显示仿真结果来结束了讨论。机器学习技术通常由于平衡类分布的假设而无法获得令人满意的不平衡数据性能。本文基于自适应过采样技术和Boosting算法，对RAnked少数族裔Boosting过采样算法（RAMOBoost）进行了研究，以减轻此类问题带来的困难。与AdaBoost学习框架集成后，RAMOBoost通过将少数样本实例创建合成实例的机会与每次迭代中其k个近邻中多数样本实例的比率成比例，来对少数样本实例进行自适应排名。通过这种方式，决策边界可以同时逐步转向难以学习的少数群体和多数群体的例子。论文涵盖了RAMOBoost的算法描述，并通过大量的仿真验证了其在现有方法上的竞争力。本文还探索了一种递归集合方法（REA），用于从不平稳的数据流中学习不均衡的类分布。 REA将先前少数类示例的选定部分推入当前数据块中，以平衡其类分布，在此基础上构建分类器，并将其添加到集合中以供将来预测。进行理论和实证研究都是为了将REA与其他方法进行比较。

著录项

作者
Chen, Sheng.;
展开▼
作者单位

Stevens Institute of Technology.;

展开▼
授予单位 Stevens Institute of Technology.;
学科 Engineering Computer.
学位 Ph.D.
年度 2011
页码 162 p.
总页数 162
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Self-Adaptive Multiprototype-Based Competitive Learning Approach: A k-Means-Type Algorithm for Imbalanced Data Clustering [J] . Lu Yang, Cheung Yiu-Ming, Tang Yuan Yan Cybernetics, IEEE Transactions on . 2021,第3期

机译：基于自适应的多律键的竞争学习方法：一种用于实施数据群集的K均值型算法
2. Adaptive sampling using self-paced learning for imbalanced cancer data pre-diagnosis [J] . Wang Qingyong, Zhou Yun, Zhang Weiming, Expert systems with applications . 2020,第Auga期

机译：适应性采样，使用自花奏学习进行不平衡癌症数据预诊断
3. AdaBoost-CNN: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced datasets using transfer learning [J] . Taherkhani Aboozar, Cosma Georgina, McGinnity T. M. Neurocomputing . 2020,第Sepa3期

机译：adaboost-cnn：卷积神经网络的自适应促进算法，用于使用传输学习对多级不平衡数据集进行分类
4. Effectiveness of Basic and Advanced Sampling Strategies on the Classification of Imbalanced Data. A Comparative Study Using Classical and Novel Metrics [C] . Mohamed S. Kraiem, Maria N. Moreno International conference on hybrid artificial intelligent systems . 2017

机译：基本和高级抽样策略对不平衡数据分类的有效性。使用古典和小说度量标准的比较研究
5. Learning in extreme conditions: Online and active learning with massive, imbalanced and noisy data. [D] . Ertekin, Seyda. 2009

机译：极端条件下的学习：具有大量，不平衡且嘈杂的数据的在线和主动学习。
6. Addendum: Hemmer S. et al. Comparison of Three Untargeted Data Processing Workflows for Evaluating LC-HRMS Metabolomics Data. [O] . Selina Hemmer, Sascha K. Manier, Svenja Fischmann, 2020

机译：附录：HemmerS.等人。三种未标准数据处理工作流程评估LC-HRMS代谢组数据的比较。
7. Machine learning for network based intrusion detection: an investigation into discrepancies in findings with the KDD cup '99 data set and multi-objective evolution of neural network classifier ensembles from imbalanced data. [O] . Engen Vegard 100

机译：基于网络的入侵检测的机器学习：使用KDD杯'99数据集和神经网络分类器的多目标进化来研究不平衡数据的结果差异。
8. Neuromorphic learning of continuous-valued mappings from noise-corrupted data. Application to real-time adaptive control [R] . Troudet, Terry, Merrill, Walter C. 1990

机译：从噪声破坏的数据中对连续值映射进行神经形态学习。应用于实时自适应控制

Adaptive learning from data flow and imbalanced data.

摘要

著录项

相似文献

相关主题

期刊订阅