首页> 外文期刊>Intelligent data analysis >Classifying evolving data streams with partially labeled data
【24h】

Classifying evolving data streams with partially labeled data

机译:使用部分标记的数据对不断发展的数据流进行分类

获取原文
获取原文并翻译 | 示例

摘要

Recently, several approaches have been proposed to deal with the increasingly challenging task of mining concept-drifting data streams. However, most are based on supervised classification algorithms assuming that true labels are immediately and entirely available in the data streams. Unfortunately, such an assumption is often violated in real-world applications given that it is expensive or because it takes a long time to obtain all true labels. To deal with this problem, we propose in this paper a new semi-supervised approach for handling concept-drifting data streams containing both labeled and unlabeled instances. First, contrary to existing approaches, we monitor three possible kinds of drift: feature, conditional or dual drift. Drift detection is based on a hypothesis test comparing Kullback-Leibler divergence between old and recent data, whose distribution under the null hypothesis of coming from the same distribution is approximated via a bootstrap method. Then, if any drift occurs, a new classifier is learned from the recent data using the EM algorithm; otherwise, the current classifier is left unchanged. Our approach is so general that it can be applied to different classification models. Experimental studies, using the naive Bayes classifier and logistic regression, on both synthetic and real-world data sets demonstrate that our approach performs well.
机译:近来,已经提出了几种方法来应对挖掘概念漂移数据流的日益挑战的任务。但是,大多数基于监督分类算法,前提是假定真实标签在数据流中立即且完全可用。不幸的是,这种假设在现实世界的应用程序中经常被违反,因为它昂贵或因为获取所有真实标签花费的时间很长。为了解决这个问题,我们在本文中提出了一种新的半监督方法,用于处理包含标记和未标记实例的概念漂移数据流。首先,与现有方法相反,我们监视三种可能的漂移:特征漂移,条件漂移或双重漂移。漂移检测基于假设检验,该假设检验比较了旧数据和新数据之间的Kullback-Leibler散度,该散度在来自相同分布的零假设下的分布通过自举法进行了近似。然后,如果发生任何漂移,则使用EM算法从最近的数据中学习新的分类器;否则,当前分类器将保持不变。我们的方法是如此通用,可以应用于不同的分类模型。使用朴素的贝叶斯分类器和逻辑回归对合成数据和真实数据集进行的实验研究表明,我们的方法效果良好。

著录项

  • 来源
    《Intelligent data analysis》 |2011年第5期|p.655-670|共16页
  • 作者单位

    Departamento de Inteligencia Artificial, Facultad de Informdtica, Universidad Politecnica de Madrid,Boadilla del Monte, Madrid, Spain;

    Departamento de Inteligencia Artificial, Facultad de Informdtica, Universidad Politecnica de Madrid,Boadilla del Monte, Madrid, Spain;

    Departamento de Inteligencia Artificial, Facultad de Informdtica, Universidad Politecnica de Madrid,Boadilla del Monte, Madrid, Spain;

  • 收录信息 美国《科学引文索引》(SCI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    data streams; concept drift; change detection; semi-supervised learning;

    机译:数据流;概念漂移变更检测;半监督学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号