首页> 外文期刊>Journal of Intelligent Information Systems >Learning from data streams with only positive and unlabeled data
【24h】

Learning from data streams with only positive and unlabeled data

机译:从仅包含肯定和未标记数据的数据流中学习

获取原文
获取原文并翻译 | 示例
           

摘要

Many studies on streaming data classification have been based on a paradigm in which a fully labeled stream is available for learning purposes. However, it is often too labor-intensive and time-consuming to manually label a data stream for training. This difficulty may cause conventional supervised learning approaches to be infeasible in many real world applications, such as credit fraud detection, intrusion detection, and rare event prediction. In previous work, Li et al. suggested that these applications be treated as Positive and Unlabeled learning problem, and proposed a learning algorithm, OcVFD, as a solution (Li et al. 2009). Their method requires only a set of positive examples and a set of unlabeled examples which is easily obtainable in a streaming environment, making it widely applicable to real-life applications. Here, we enhance Li et al.'s solution by adding three features: an efficient method to estimate the percentage of positive examples in the training stream, the ability to handle numeric attributes, and the use of more appropriate classification methods at tree leaves. Experimental results on synthetic and real-life datasets show that our enhanced solution (called PUVFDT) has very good classification performance and a strong ability to learn from data streams with only positive and unlabeled examples. Furthermore, our enhanced solution reduces the learning time of OcVFDT by about an order of magnitude. Even with 80% of the examples in the training data stream unlabeled, PUVFDT can still achieve a competitive classification performance compared with that of VFDTcNB (Gama et al. 2003), a supervised learning algorithm.
机译:关于流数据分类的许多研究都基于一种范例,在该范例中,可以将完全标记的流用于学习目的。但是,手动标记数据流进行培训通常太费力和费时。这种困难可能导致传统的监督学习方法在许多现实世界的应用中不可行,例如信用欺诈检测,入侵检测和罕见事件预测。在以前的工作中,李等人。建议将这些应用程序视为正向和无标签学习问题,并提出了一种学习算法OcVFD作为解决方案(Li等,2009)。他们的方法仅需要一组积极的示例和一组未标记的示例,而这些示例可以在流环境中轻松获得,从而使其广泛应用于现实生活中。在这里,我们通过添加三个功能来增强Li等人的解决方案:一种有效的方法来估计训练流中正例的百分比,处理数字属性的能力以及在树叶上使用更合适的分类方法。在合成数据和真实数据集上的实验结果表明,我们的增强解决方案(称为PUVFDT)具有非常好的分类性能,并且具有从数据流中学习(仅带有正面和未标记示例)的强大能力。此外,我们的增强型解决方案将OcVFDT的学习时间减少了大约一个数量级。即使训练数据流中有80%的示例未标记,相比于监督学习算法VFDTcNB(Gama等人,2003),PUVFDT仍可以实现竞争性的分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号