首页> 外文会议>Annual Allerton Conference on Communication, Control, and Computing >Learning optimal classifier chains for real-time big data mining
【24h】

Learning optimal classifier chains for real-time big data mining

机译:学习最佳分类器链,用于实时大数据挖掘

获取原文

摘要

A plethora of emerging Big Data applications require processing and analyzing streams of data to extract valuable information in real-time. For this, chains of classifiers which can detect various concepts need to be constructed in real-time. In this paper, we propose online distributed algorithms which can learn how to construct the optimal classifier chain in order to maximize the stream mining performance (i.e. mining accuracy minus cost) based on the dynamically-changing data characteristics. The proposed solution does not require the distributed local classifiers to exchange any information when learning at runtime. Moreover, our algorithm requires only limited feedback of the mining performance to enable the learning of the optimal classifier chain. We model the learning problem of the optimal classifier chain at run-time as a multi-player multi-armed bandit problem with limited feedback. To our best knowledge, this paper is the first that applies bandit techniques to stream mining problems. However, existing bandit algorithms are inefficient in the considered scenario due to the fact that each component classifier learns its optimal classification functions using only the aggregate overall reward without knowing its own individual reward and without exchanging information with other classifiers. We prove that the proposed algorithms achieve logarithmic learning regret uniformly over time and hence, they are order optimal. Therefore, the long-term time average performance loss tends to zero. We also design learning algorithms whose regret is linear in the number of classification functions. This is much smaller than the regret results which can be obtained using existing bandit algorithms that scale linearly in the number of classifier chains and exponentially in the number of classification functions.
机译:一种新兴大数据应用需要处理和分析数据流,以实时提取有价值的信息。为此,需要检测各种概念的分类器链条需要实时构建。在本文中,我们提出了在线分布式算法,该算法可以了解如何构建最佳分类器链,以便基于动态变化的数据特性来最大化流挖掘性能(即挖掘精度减去成本)。所提出的解决方案不需要分布式本地分类器在运行时在学习时交换任何信息。此外,我们的算法只需要有限的挖掘性能反馈,以实现最佳分类器链的学习。我们在运行时模拟最佳分类器链的学习问题,作为具有有限反馈的多员多武装强盗问题。为了我们的最佳知识,本文是第一个应用匪徒技术来流挖掘挖掘问题。然而,由于每个组件分类器在不知道其自己的个人奖励和不与其他分类器交换信息的情况下,每个组件分类器使用聚合总奖励的事实我们证明,所提出的算法逐渐达到对数学习后悔,因此,它们是最佳的。因此,长期时间平均性能损失趋于为零。我们还设计了学习算法,其后悔在分类功能的数量中是线性的。这远小于遗憾结果,其可以使用现有的强盗算法可以在分类功能的数量和呈指数中以分类函数的次数进行线性缩放。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号