【24h】

Building a New Classifier in an Ensemble Using Streaming Unlabeled Data

机译:使用流式未标记数据在集成中构建新分类器

获取原文

摘要

It is expensive and impractical to manually label all samples in real-world streaming data when the correct class is not available in real time. In this paper, we propose an ensemble method of determining which samples should be labeled from streaming unlabeled data and when they will be labeled according to changes in distribution of streaming unlabeled data. In particular, the labeling point in time is an important factor for building an efficient ensemble in practical aspects. In order to evaluate the performance of our ensemble method, we used synthetic streaming data with concept drift and the intrusion detection data from the KDD'99 Cup. We compared the results of the proposed method and those of the existing ensemble methods that periodically build new classifiers for an ensemble. In the synthetic streaming data, the proposed method produced average 14.1% higher classification accuracy, and the number of new classifiers reduced by average 12.6%. With the intrusion detection data, our method produced similar accuracy to existing methods but used only 0.007% of the labeled streaming data.
机译:当无法实时获得正确的类时,手动标记现实流数据中的所有样本既昂贵又不切实际。在本文中,我们提出了一种整体方法,该方法根据流式未标记数据的分布变化来确定应从流式传输未标记数据中标记哪些样本以及何时对它们进行标记。特别地,标记时间点是在实践方面建立有效整体的重要因素。为了评估集成方法的性能,我们使用了带有概念漂移的合成流数据和来自KDD'99 Cup的入侵检测数据。我们比较了所提出的方法和现有集成方法的结果,这些方法定期为集成创建新的分类器。在合成流数据中,该方法产生的分类准确率平均提高了14.1%,新分类器的数量平均减少了12.6%。利用入侵检测数据,我们的方法产生了与现有方法相似的准确性,但仅使用了0.007%的标记流数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号