Learning optimal classifier chains for real-time big data mining

机译：学习最佳分类器链以进行实时大数据挖掘

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A plethora of emerging Big Data applications require processing and analyzing streams of data to extract valuable information in real-time. For this, chains of classifiers which can detect various concepts need to be constructed in real-time. In this paper, we propose online distributed algorithms which can learn how to construct the optimal classifier chain in order to maximize the stream mining performance (i.e. mining accuracy minus cost) based on the dynamically-changing data characteristics. The proposed solution does not require the distributed local classifiers to exchange any information when learning at runtime. Moreover, our algorithm requires only limited feedback of the mining performance to enable the learning of the optimal classifier chain. We model the learning problem of the optimal classifier chain at run-time as a multi-player multi-armed bandit problem with limited feedback. To our best knowledge, this paper is the first that applies bandit techniques to stream mining problems. However, existing bandit algorithms are inefficient in the considered scenario due to the fact that each component classifier learns its optimal classification functions using only the aggregate overall reward without knowing its own individual reward and without exchanging information with other classifiers. We prove that the proposed algorithms achieve logarithmic learning regret uniformly over time and hence, they are order optimal. Therefore, the long-term time average performance loss tends to zero. We also design learning algorithms whose regret is linear in the number of classification functions. This is much smaller than the regret results which can be obtained using existing bandit algorithms that scale linearly in the number of classifier chains and exponentially in the number of classification functions.

机译：大量新兴的大数据应用程序需要处理和分析数据流以实时提取有价值的信息。为此，需要实时构建可检测各种概念的分类器链。在本文中，我们提出了在线分布式算法，该算法可以学习如何构造最优的分类器链，以基于动态变化的数据特征最大化流挖掘性能（即挖掘精度减去成本）。所提出的解决方案不需要分布式本地分类器在运行时学习时交换任何信息。而且，我们的算法只需要有限的挖掘性能反馈，就可以学习最佳的分类器链。我们在运行时将最佳分类器链的学习问题建模为反馈有限的多玩家多武装匪徒问题。就我们所知，本文是第一个将强盗技术应用于流式挖掘问题的方法。但是，由于每个组件分类器仅使用总的总体奖励来学习其最优分类功能，而又不知道自己的个人奖励并且不与其他分类器交换信息，因此现有的强盗算法在考虑的场景中效率低下。我们证明了提出的算法随着时间的推移均匀地实现了对数学习后悔，因此它们是阶最优的。因此，长期时间平均性能损失趋向于零。我们还设计了学习算法，其后悔在分类函数的数量上是线性的。这比使用现有的强盗算法所获得的遗憾结果要小得多，现有的强盗算法在分类器链的数量上线性增长，而在分类函数的数量上呈指数增长。

著录项

来源
《Annual Allerton Conference on Communication, Control, and Computing》|2013年|512-519|共8页
会议地点
作者
Xu Jie; Tekin Cem; van der Schaar Mihaela;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A Rules-Based Approach for Configuring Chains of Classifiers in Real-Time Stream Mining Systems [J] . Brian Foo, Mihaela van der Schaar EURASIP journal on advances in signal processing . 2009,第7期

机译：实时流挖掘系统中基于规则的分类器链配置方法
2. A Rules-Based Approach for Configuring Chains of Classifiers in Real-Time Stream Mining Systems [J] . Brian Foo, Mihaela van der Schaar EURASIP journal on advances in signal processing . 2009,第1期

机译：实时流挖掘系统中基于规则的分类器链配置方法
3. Productivity estimation of cutter suction dredger operation through data mining and learning from real-time big data [J] . Fu Jiake, Tian Huijing, Song Lingguang, Engineering construction and architectural management . 2021,第7期

机译：通过数据挖掘和实时大数据学习刀具抽吸挖掘机操作的生产率估算
4. Learning optimal classifier chains for real-time big data mining [C] . Xu Jie, Tekin Cem, van der Schaar Mihaela Annual Allerton Conference on Communication, Control, and Computing . 2013

机译：学习最佳分类器链，用于实时大数据挖掘
5. Automated Design for Manufacturing and Supply Chain Using Geometric Data Mining and Machine Learning. [D] . Hoefer, Michael Jeffrey Daniel. 2017

机译：使用几何数据挖掘和机器学习的制造和供应链自动化设计。
6. Evaluation of Stream Mining Classifiers for Real-Time Clinical Decision Support System: A Case Study of Blood Glucose Prediction in Diabetes Therapy [O] . Simon Fong, Yang Zhang, Jinan Fiaidhi, 2006

机译：实时临床决策支持系统中流分类器的评估：糖尿病治疗中血糖预测的案例研究
7. A Rules-Based Approach for Configuring Chains of Classifiers in Real-Time Stream Mining Systems [O] . Brian Foo, Mihaela van der Schaar 2009

机译：实时流挖掘系统中基于规则的分类器链配置方法
8. Classifying Noisy Protein Sequence Data: A Case Study of Immunoglobulin Light Chains; Journal article [R] . Yu, C., Zavaljevski, N., Stevens, F. J., 2005

机译：噪声蛋白质序列数据分类：免疫球蛋白光链的案例研究;杂志文章

Learning optimal classifier chains for real-time big data mining

摘要

著录项

相似文献

相关主题

期刊订阅