首页> 外文期刊>Computers & mathematics with applications >A Precise Statistical approach for concept change detection in unlabeled data streams
【24h】

A Precise Statistical approach for concept change detection in unlabeled data streams

机译:用于未标记数据流中概念更改检测的精确统计方法

获取原文
获取原文并翻译 | 示例

摘要

Recently data stream has been extensively explored due to its emergence in a great deal of applications such as sensor networks, web click streams and network flows. One of the most important challenges in data streams is concept change where data underlying distributions change from time to time. A vast majority of researches in the context of data stream mining are devoted to labeled data, whereas, in real word human practice label of data are rarely available to the learning algorithms. Moreover, most of the methods that detect changes in unlabeled data stream merely deal with numerical data sets, and also, they are facing considerable difficulty when dimension of data tends to increase. In this paper, we present a Precise Statistical approach for Concept Change Detection in unlabeled data streams, which, abbreviated as PSCCD, detects changes using an exchangeable test. This hypothesis test is driven from a martingale which is based on Doob's Maximal Inequality. The advantages of our approach are three fold. First, it does not require a sliding window on the data stream whose size is a well-known challenging issue; second, it works well in multi-dimensional data stream, and last but not the least, it is applicable to different types of data including categorical, numerical and mixed-attribute data streams. To explore the advantages of our approach, quite a lot of experiments with different settings and specifications are conducted. The obtained results are very promising.
机译:近年来,由于数据流在诸如传感器网络,Web点击流和网络流等大量应用中的出现,已经得到了广泛的探索。数据流中最重要的挑战之一是概念的改变,其中基础分布的数据不时发生变化。数据流挖掘方面的绝大多数研究都致力于标记数据,而实际上,人类实践中的数据标记很少可供学习算法使用。此外,大多数检测未标记数据流中变化的方法仅处理数字数据集,并且,当数据的尺寸趋于增加时,它们面临相当大的困难。在本文中,我们提出了一种用于未标记数据流中概念更改检测的精确统计方法,该方法缩写为PSCCD,使用可交换测试来检测更改。该假设检验来自基于Doob最大不等式的a。我们方法的优点是三方面的。首先,它不需要数据流上的滑动窗口,其大小是众所周知的挑战性问题;其次,它在多维数据流中效果很好,最后但并非最不重要的一点,它适用于不同类型的数据,包括分类,数字和混合属性数据流。为了探索我们方法的优势,我们进行了许多不同设置和规格的实验。获得的结果非常有希望。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号