首页> 外文期刊>Expert systems with applications >Novel hybrid pair recommendations based on a large-scale comparative study of concept drift detection
【24h】

Novel hybrid pair recommendations based on a large-scale comparative study of concept drift detection

机译:基于大规模比较研究的概念漂移检测的新型混合对推荐

获取原文
获取原文并翻译 | 示例

摘要

During the classification of streaming data, changes in the underlying distribution make formerly learned models insecure and imprecise, which is known as the concept drift phenomenon. Online learning derives information from a vast volume of stream data, which are usually affected by these changes in unforeseen ways and are currently generated primarily by the Internet of Things, social media applications, and the stock market. There is abundant literature focused on addressing concept drift using detectors, which essentially attempt to forecast the position of the change to improve the overall accuracy by altering the base learner. This paper presents novel hybrid pairs (classifier and detector) collected from a large-scale comparison of 15 drift detectors; drift detection method (DDM), early drift detection method (EDDM), EWMA for concept drift detection (ECDD), adaptive sliding window (ADWIN), geometrical moving average (GMA), drift detection methods based on Hoeffding's bound (HDDMA and HDDMw), Fisher exact test drift detector (FTDD), fast Hoeffding drift detection method (FHDDM), Page-Hinkley test (PH), reactive drift detection method (RDDM), SEED, statistical test of equal proportions (STEPD), SeqDrift2, and Wilcoxon rank-sum test drift detector (WSTD) and six classifiers; Nave Bayes (NB), Hoeffding tre (HT), Hoeffding option tre (HOT), Perceptron (P), decision stump (DS), and knearest neighbour (KNN), to determine and recommend the best pair in accordance with the properties of the dataset. The objective of this study is to assess the contribution of a detector to a classifier and obtain the most efficient matched pairs. Through these pairwise comparison experiments, the accuracy rates and evaluation times of the pairs, as well as their false positives, true negatives, false negatives, true positives, drift detection delay, and the MCC. Additionally, the Nemenyi test is employed to compare the pairs against other methods to identify the method(s) for which there is a statistical difference. The results of the experiments indicate that the most efficient pairs which differed for each dataset type and size-primarily include the HDDMA, RDDM, WSTD, and FHDDM detectors.
机译:在流数据分类期间,底层分布的变化使得以前学习的模型不安全和不精确,这被称为概念漂移现象。在线学习从大量的流数据中获取信息,这些信息通常受到不可预见的方式的这些变化的影响,目前主要由东西,社交媒体应用和股票市场产生。有里面有丰富的文学专注于使用探测器寻址概念漂移,这主要是尝试通过改变基本学习者来提高变化的位置来提高整体准确性。本文提出了从15次漂移探测器的大规模比较收集的新型混合对(分类器和检测器);漂移检测方法(DDM),早期漂移检测方法(EDDM),EWMA用于概念漂移检测(ECDD),自适应滑动窗口(ADWIN),几何移动平均(GMA),基于Hoeffd的绑定(HDDMA和HDDMW)的漂移检测方法,Fisher精确测试漂移探测器(FTDD),快速Hoeffding漂移检测方法(FHDDM),Page-Hinkley测试(pH),反应漂移检测方法(RDDM),种子,等比例的统计测试(STEPD),SEQDRIFT2和WILCOXON秩 - 和测试漂移探测器(WSTD)和六分类器; Nave Bayes(NB),Hoeffding Tre(HT),Hoeffding Option Tre(Hot),Perceptron(P),决策树桩(DS)和拐杖(Knn),以根据属性确定和推荐最佳对数据集。本研究的目的是评估检测器对分类器的贡献,并获得最有效的匹配对。通过这些成对比较实验,对成对的准确度和评估时间,以及它们的误报,真正的否定,假阴性,真正的阳性,漂移检测延迟和MCC。另外,采用Nemenyi测试来比较对其他方法的对,以识别存在统计差异的方法。实验结果表明,每个数据集类型和大小不同的最有效的对 - 主要包括HDDMA,RDDM,WSTD和FHDDM检测器。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号