...
首页> 外文期刊>Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies >STDS: self-training data streams for mining limited labeled data in non-stationary environment
【24h】

STDS: self-training data streams for mining limited labeled data in non-stationary environment

机译:STDS:用于非静止环境中的挖掘有限标记数据的自培训数据流

获取原文
获取原文并翻译 | 示例

摘要

Inthis article, wefocus on the classification problem to semi-supervised learning in non-stationary environment. Semi-supervised learning is a learning task from both labeled and unlabeled data points. There are several approaches to semi-supervised learning in stationary environment which are not applicable directly for data streams. We propose a novel semi-supervised learning algorithm, named STDS. The proposed approach uses labeled and unlabeled data and employs an approach to handle the concept drift in data streams. The main challenge in semi-supervised self-training for data streams is to find a proper selection metric in order to find a set of high-confidence predictions and a proper underlying base learner. We therefore propose an ensemble approach to find a set of high-confidence predictions based on clustering algorithms and classifier predictions. We then employ the Kullback-Leibler (KL) divergence approach to measure the distribution differences between sequential chunks in order to detect the concept drift. When drift is detected, a new classifier is updated from the new set of labeled data in the current chunk; otherwise, a percentage of high-confidence newly labeled data in the current chunk is added to the labeled data in the next chunk for updating the incremental classifier based on the proposed selection metric. The results of our experiments on a number of classification benchmark datasets show that STDS outperforms the supervised and the most of other semi-supervised learning methods.
机译:INTHIS文章,WEFOCUS在非静止环境中为半监督学习的分类问题。半监督学习是来自标记和未标记的数据点的学习任务。静止环境中有几种半监督学习方法,这不适用于数据流。我们提出了一种新颖的半监督学习算法,名为STD。所提出的方法使用标记和未标记的数据,采用一种方法来处理数据流中的概念漂移。用于数据流的半监督自我训练的主要挑战是找到一个适当的选择度量,以便找到一组高信N信心预测和适当的基础基础学习者。因此,我们提出了一种基于聚类算法和分类器预测的一组高置信预测的集合方法。然后,我们采用Kullback-Leibler(KL)发散方法来测量顺序块之间的分布差,以便检测概念漂移。当检测到漂移时,从当前块中的新的标记数据集更新新分类器;否则,将当前块中的新标记数据的新标记数据的百分比被添加到下一个块中的标记数据,以基于所提出的选择度量来更新增量分类器。我们对许多分类基准数据集的实验结果表明,STDS优于监督和大多数半监督的学习方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号