STDS: self-training data streams for mining limited labeled data in non-stationary environment

首页> 外文期刊>Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies >STDS: self-training data streams for mining limited labeled data in non-stationary environment

【24h】

STDS: self-training data streams for mining limited labeled data in non-stationary environment

机译：STDS：用于非静止环境中的挖掘有限标记数据的自培训数据流

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Inthis article, wefocus on the classification problem to semi-supervised learning in non-stationary environment. Semi-supervised learning is a learning task from both labeled and unlabeled data points. There are several approaches to semi-supervised learning in stationary environment which are not applicable directly for data streams. We propose a novel semi-supervised learning algorithm, named STDS. The proposed approach uses labeled and unlabeled data and employs an approach to handle the concept drift in data streams. The main challenge in semi-supervised self-training for data streams is to find a proper selection metric in order to find a set of high-confidence predictions and a proper underlying base learner. We therefore propose an ensemble approach to find a set of high-confidence predictions based on clustering algorithms and classifier predictions. We then employ the Kullback-Leibler (KL) divergence approach to measure the distribution differences between sequential chunks in order to detect the concept drift. When drift is detected, a new classifier is updated from the new set of labeled data in the current chunk; otherwise, a percentage of high-confidence newly labeled data in the current chunk is added to the labeled data in the next chunk for updating the incremental classifier based on the proposed selection metric. The results of our experiments on a number of classification benchmark datasets show that STDS outperforms the supervised and the most of other semi-supervised learning methods.

机译：INTHIS文章，WEFOCUS在非静止环境中为半监督学习的分类问题。半监督学习是来自标记和未标记的数据点的学习任务。静止环境中有几种半监督学习方法，这不适用于数据流。我们提出了一种新颖的半监督学习算法，名为STD。所提出的方法使用标记和未标记的数据，采用一种方法来处理数据流中的概念漂移。用于数据流的半监督自我训练的主要挑战是找到一个适当的选择度量，以便找到一组高信N信心预测和适当的基础基础学习者。因此，我们提出了一种基于聚类算法和分类器预测的一组高置信预测的集合方法。然后，我们采用Kullback-Leibler（KL）发散方法来测量顺序块之间的分布差，以便检测概念漂移。当检测到漂移时，从当前块中的新的标记数据集更新新分类器;否则，将当前块中的新标记数据的新标记数据的百分比被添加到下一个块中的标记数据，以基于所提出的选择度量来更新增量分类器。我们对许多分类基准数据集的实验结果表明，STDS优于监督和大多数半监督的学习方法。

著录项

来源
《Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies 》 |2020年第5期| 共20页
作者

展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术 ;
关键词
Semi-supervised learning; Self-training; Data streams; Concept drift; Clustering algorithm;

机译：半监督学习;自我训练;数据流;概念漂移;聚类算法;

相似文献

外文文献
中文文献
专利

1. STDS: self-training data streams for mining limited labeled data in non-stationary environment [J] . Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2020 ,第5期

机译：STDS：用于非静止环境中的挖掘有限标记数据的自培训数据流
2. An online adaptive classifier ensemble for mining non-stationary data streams [J] . Verdecia-Cabrera Alberto, Blanco Isvani Frias, Carvalho Andre C. P. L. F. Intelligent data analysis . 2018 ,第4期

机译：用于挖掘非平稳数据流的在线自适应分类器集成
3. A data-stream-based abnormal data mining in web texts environment [J] . Jin-Yun Wang, Ye-Zheng Liu, Jin-Kun Wang Journal of Computational Methods in Sciences and Engineering . 2016 ,第2期

机译：Web文本环境中基于数据流的异常数据挖掘
4. A new ensemble method for multi-label data stream classification in non-stationary environment [C] . Song Ge, Ye Yunming International Joint Conference on Neural Networks . 2014

机译：非平稳环境下多标签数据流分类的集成新方法
5. Sluicebox: Semi-supervised learning for label prediction with concept evolution and tracking in non-stationary data streams. [D] . Parker, Brandon Shane. 2014

机译：Sluicebox：半监督学习，用于标签预测，概念演变以及在非平稳数据流中的跟踪。
6. FlexSketch: Estimation of Probability Density for Stationary and Non-Stationary Data Streams [O] . Namuk Park, Songkuk Kim 2021

机译：FlexSketch：估计静止和非静止数据流的概率密度
7. A Single-pass Online Data Mining Algorithm Combined with Control Theory with Limited Memory in Dynamic Data Streams [O] . Yanxiang He, Naixue Xiong, Xavier Défago, 2010

机译：动态数据流中结合控制理论和有限内存的单次在线数据挖掘算法
8. Data Stream Mining Based Dynamic Link Anomaly Analysis Using Paired Sliding Time Window Data. [R] . Han, K., Zhang, T., Liao, Q. 2014

机译：基于数据流挖掘的成对滑动时间窗数据动态链接异常分析。

STDS: self-training data streams for mining limited labeled data in non-stationary environment

摘要

著录项

相似文献

相关主题

期刊订阅