首页> 外文期刊>Engineering Applications of Artificial Intelligence >Batch and data streaming classification models for detecting adverse events and understanding the influencing factors
【24h】

Batch and data streaming classification models for detecting adverse events and understanding the influencing factors

机译:批处理和数据流分类模型,用于检测不良事件并了解影响因素

获取原文
获取原文并翻译 | 示例
           

摘要

Constructing effective models for detecting, reducing, and/or preventing adverse events is very important in domains such as aviation safety, healthcare, drug administration, and war theaters. This study presents batch and data streaming models to detecting adverse events using data from a war theater context. In all the previous studies, regression models and several machine learning techniques were used for predicting continuous values in an active theater of war, and the error values reported on the test sets were large. In order to overcome the shortcoming, this study investigates the effectiveness of batch and data streaming classification algorithms in detecting or classifying adverse events given infrastructure development spending data and other variables in an active theater of war in Afghanistan. By the feature selection, the valid input variables are obtained and their indexes show that the input variables are mainly the adverse events (t-1) at the previous month, the population densities and related project investments. From the country level, fewer of the 14 project investments affect the adverse events. From the region level, some projects with higher index values, such as Security in the South Western region, Energy and Emergency Assistance in the North Eastern region, and Education in the Eastern region are mainly affecting factors. Three batch classification methods and three data streaming classification methods were assessed for their ability to detect adverse events given infrastructure development data. The study uses cost-sensitive measures to address the very unbalanced nature of the data and it applies variable reduction techniques to identify significant variables. The three batch classification algorithms are C4.5, k-nearest Neighbor, and Support Vector Machine. The three data streaming algorithms are Naive Bayes, Hoeffding Tree, and Single Classifier Drift. In general, the performance of the cost-sensitive methods in the batch setting is comparable to those in the data stream setting. However, in the batch setting the cost matrix needs to be adjusted manually. In contrast the data stream setting allows one to adjust the models based on the analysis of the classifiers' performance over time and changing data distribution. The Kappa values using Naive Bayes are the highest in the three data stream algorithms in the whole country and its regions. The Naive Bayes classifier has the best global performance. By the Kappa statistic curve, we can observe the concept drifts. In a region level, many models have a better performance including more investments related to project compared with those in a country level. In addition as data distribution becomes more balanced, the classifiers in the data stream setting outperform in terms of the overall classification rates in comparison to the classifiers in the batch setting. The results thus demonstrate the potential of data streaming algorithms to significantly outperform when the data become less unbalanced, and can be used for detecting adverse events in similar areas.
机译:在诸如航空安全,医疗保健,药物管理和战区等领域,建立用于检测,减少和/或预防不良事件的有效模型非常重要。这项研究提出了批处理和数据流模型,以使用战场环境中的数据检测不良事件。在先前的所有研究中,都使用回归模型和几种机器学习技术来预测活跃战场中的连续值,并且测试集上报告的误差值很大。为了克服该缺点,本研究调查了批处理和数据流分类算法在给定基础设施开发支出数据和阿富汗战场上其他变量的情况下对不良事件进行检测或分类的有效性。通过特征选择,得到有效的输入变量,它们的指标表明输入变量主要是上个月的不良事件(t-1),人口密度和相关的项目投资。从国家层面来看,影响不良事件的14个项目投资中,较少。从区域层面来看,一些具有较高指数值的项目,例如西南地区的安全,东北地区的能源和紧急援助以及东部地区的教育,是主要影响因素。评估了三种批处理分类方法和三种数据流分类方法在给定基础设施开发数据的情况下检测不良事件的能力。这项研究使用了成本敏感的措施来解决数据的非常不平衡的性质,并应用了变量减少技术来识别重要变量。三种批次分类算法是C4.5,k最近邻和支持向量机。三种数据流算法是朴素贝叶斯,霍夫丁树和单分类器漂移。通常,批处理设置中成本敏感型方法的性能可与数据流设置中的性能相媲美。但是,在批次设置中,需要手动调整成本矩阵。相反,数据流设置允许基于对分类器随时间推移的性能分析和数据分布变化来调整模型。在整个国家及其地区中,使用朴素贝叶斯算法的Kappa值在三种数据流算法中最高。朴素贝叶斯分类器具有最佳的全局性能。通过Kappa统计曲线,我们可以观察到概念漂移。在区域一级,与国家一级相比,许多模型具有更好的性能,包括与项目相关的更多投资。另外,随着数据分配变得更加平衡,与批处理设置中的分类器相比,数据流设置中的分类器在总体分类率方面表现优异。因此,结果证明了数据流算法在数据变得不平衡时表现出显着优于潜在性能的潜力,可用于检测相似区域中的不良事件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号