首页> 中文期刊>湖北农业科学 >基于随机森林的登革热时空扩散影响因子等级体系挖掘




为了克服经典统计学模型在定量研究各风险因子对登革热影响程度时存在的无法顾及非线性的风险因子、不能解释因子之间所具有的复杂相互作用关系等缺陷,研究基于时空数据挖掘理论,综合选取了与登革热有关的4类共25个潜在风险因子,采用Pearson相关性分析对风险因子进行初步筛选;利用随机森林算法对登革热及其潜在风险因子进行训练,挖掘影响登革热发生、扩散的风险因子,确定风险因子的等级排名体系.结果表明,采用随机森林比传统的线性模型具备更优秀的数据挖掘能力;登革热风险因子的风险等级排名由高到低分别为第一等级(人口密度、居民地、左邻域、右邻域);第二等级(下邻域、上邻域);第三等级(道路、左下邻域、右上邻域、右下邻域、左上邻域、降雨量、O3、PM2.5、PM10、CO、NO2、池塘);第四等级(温度、农用地、林地).随机森林模型可很好地挖掘并量化影响登革热的各类风险因子,解释各风险因子间的相互关系.%Previous researches on dengue fever (DF) mostly adopted the classical quantitative statistical model,but it is hard to consider nonlinear presence of risk factors and to explain their complex interaction relationship. To solve these problems,25 potential risk factors of DF were chosen and screened preliminarily by Pearson correlation method,and potential risk factors that lead to occurrence and diffusion of DF were found out by random forest(RF),and their quantitative evaluation system was also determined. The results showed that data mining ability of RF was better than classical linear model. The risk factors of DF were divided into 4 grades according to its risk to DF from big to small, the first grade included population density, residential distribution,left neighborhood and right neighborhood; the second grade included lower neighborhood and higher neighborhood;the third grade included road,left lower neighborhood, right higher neighborhood, right lower neighborhood,left higher neighborhood,rainfall,O3,PM2.5,PM10,CO,NO2 and pond; the fourth grade included temperature, agricultural land and woodland. In conclusion,RF model could effectively explore and quantify the impacts of various risk factors of DF,and explain the relationship among the various risk factors.



  • 中文文献
  • 外文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号