Learning from data streams with only positive and unlabeled data

Xiangju Qin; Yang Zhang; Chen Li; Xue Li

首页> 外文期刊>Journal of Intelligent Information Systems >Learning from data streams with only positive and unlabeled data

【24h】

Learning from data streams with only positive and unlabeled data

机译：从仅包含肯定和未标记数据的数据流中学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many studies on streaming data classification have been based on a paradigm in which a fully labeled stream is available for learning purposes. However, it is often too labor-intensive and time-consuming to manually label a data stream for training. This difficulty may cause conventional supervised learning approaches to be infeasible in many real world applications, such as credit fraud detection, intrusion detection, and rare event prediction. In previous work, Li et al. suggested that these applications be treated as Positive and Unlabeled learning problem, and proposed a learning algorithm, OcVFD, as a solution (Li et al. 2009). Their method requires only a set of positive examples and a set of unlabeled examples which is easily obtainable in a streaming environment, making it widely applicable to real-life applications. Here, we enhance Li et al.'s solution by adding three features: an efficient method to estimate the percentage of positive examples in the training stream, the ability to handle numeric attributes, and the use of more appropriate classification methods at tree leaves. Experimental results on synthetic and real-life datasets show that our enhanced solution (called PUVFDT) has very good classification performance and a strong ability to learn from data streams with only positive and unlabeled examples. Furthermore, our enhanced solution reduces the learning time of OcVFDT by about an order of magnitude. Even with 80% of the examples in the training data stream unlabeled, PUVFDT can still achieve a competitive classification performance compared with that of VFDTcNB (Gama et al. 2003), a supervised learning algorithm.

机译：关于流数据分类的许多研究都基于一种范例，在该范例中，可以将完全标记的流用于学习目的。但是，手动标记数据流进行培训通常太费力和费时。这种困难可能导致传统的监督学习方法在许多现实世界的应用中不可行，例如信用欺诈检测，入侵检测和罕见事件预测。在以前的工作中，李等人。建议将这些应用程序视为正向和无标签学习问题，并提出了一种学习算法OcVFD作为解决方案（Li等，2009）。他们的方法仅需要一组积极的示例和一组未标记的示例，而这些示例可以在流环境中轻松获得，从而使其广泛应用于现实生活中。在这里，我们通过添加三个功能来增强Li等人的解决方案：一种有效的方法来估计训练流中正例的百分比，处理数字属性的能力以及在树叶上使用更合适的分类方法。在合成数据和真实数据集上的实验结果表明，我们的增强解决方案（称为PUVFDT）具有非常好的分类性能，并且具有从数据流中学习（仅带有正面和未标记示例）的强大能力。此外，我们的增强型解决方案将OcVFDT的学习时间减少了大约一个数量级。即使训练数据流中有80％的示例未标记，相比于监督学习算法VFDTcNB（Gama等人，2003），PUVFDT仍可以实现竞争性的分类性能。

著录项

来源
《Journal of Intelligent Information Systems》 |2013年第3期|405-430|共26页
作者
Xiangju Qin; Yang Zhang; Chen Li; Xue Li;
展开▼
作者单位

College of Information Engineering, Northwest A&F University, Yangling, China;

College of Information Engineering, Northwest A&F University, Yangling, China,State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;

College of Information Engineering, Northwest A&F University, Yangling, China;

School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Positive and unlabeled learning; Data stream classification; Incremental learning; Functional leaves;

机译：积极的和没有标签的学习;数据流分类;增量学习;功能性叶子;

相似文献

外文文献
中文文献
专利

1. Learning very fast decision tree from uncertain data streams with positive and unlabeled samples [J] . Liang C., Zhang Y., Shi P., Information Sciences: An International Journal . 2013,第Null期

机译：从具有正和未标记的样本的不确定数据流学习非常快的决策树
2. Learning from concept drifting data streams with unlabeled data [J] . Xindong Wu, Peipei Li, Xuegang Hu Neurocomputing . 2012,第期

机译：从带有未标记数据的概念漂移数据流中学习
3. Two birds with one stone: Classifying positive and unlabeled examples on uncertain data streams [J] . Donghong Han, Shuoru Li, Fulin Wei, Neurocomputing . 2018,第FEBa14期

机译：两只鸟只有一块石头：在不确定的数据流上对阳性和未标记的示例进行分类
4. Positive Unlabeled Learning for Data Stream Classification [C] . Xiao-Li Li, Philip S. Yu, Bing Liu, SIAM International Conference on Data Mining . 2009

机译：数据流分类的正面未标记学习
5. Learning with unlabeled data. [D] . Xu, Zenglin. 2009

机译：学习未标记的数据。
6. Facilitating information extraction without annotated data using unsupervised and positive-unlabeled learning [O] . Zfania Tom Korach, Sharmitha Yerneni, Jonathan Einbinder, 2020

机译：促进信息提取而不使用无监督和正面未标记的学习注释数据
7. Positive unlabeled learning for data stream classification [O] . Xiao-li Li, Philip S. Yu, Bing Liu, 2009

机译：数据流分类的正面未标记学习

Learning from data streams with only positive and unlabeled data

摘要

著录项

相似文献

相关主题

期刊订阅