首页> 美国卫生研究院文献>The Scientific World Journal >Machine Learning Model for Imbalanced Cholera Dataset in Tanzania
【2h】

Machine Learning Model for Imbalanced Cholera Dataset in Tanzania

机译:坦桑尼亚霍乱数据集的机器学习模型

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Cholera epidemic remains a public threat throughout history, affecting vulnerable population living with unreliable water and substandard sanitary conditions. Various studies have observed that the occurrence of cholera has strong linkage with environmental factors such as climate change and geographical location. Climate change has been strongly linked to the seasonal occurrence and widespread of cholera through the creation of weather patterns that favor the disease's transmission, infection, and the growth of Vibrio cholerae, which cause the disease. Over the past decades, there have been great achievements in developing epidemic models for the proper prediction of cholera. However, the integration of weather variables and use of machine learning techniques have not been explicitly deployed in modeling cholera epidemics in Tanzania due to the challenges that come with its datasets such as imbalanced data and missing information. This paper explores the use of machine learning techniques to model cholera epidemics with linkage to seasonal weather changes while overcoming the data imbalance problem. Adaptive Synthetic Sampling Approach (ADASYN) and Principal Component Analysis (PCA) were used to the restore sampling balance and dimensional of the dataset. In addition, sensitivity, specificity, and balanced-accuracy metrics were used to evaluate the performance of the seven models. Based on the results of the Wilcoxon sign-rank test and features of the models, XGBoost classifier was selected to be the best model for the study. Overall results improved our understanding of the significant roles of machine learning strategies in health-care data. However, the study could not be treated as a time series problem due to the data collection bias. The study recommends a review of health-care systems in order to facilitate quality data collection and deployment of machine learning techniques.
机译:霍乱疫情在整个历史上仍然是一种公共威胁,影响着生活用水不可靠和卫生条件不合格的脆弱人群。各种研究已经观察到霍乱的发生与气候变化和地理位置等环境因素有很强的联系。气候变化与霍乱的季节性发生和霍乱的发生密切相关,这是通过创造有利于该病传播,感染和引起该病的霍乱弧菌生长的天气模式而引起的。在过去的几十年中,在开发正确预测霍乱的流行病模型方面取得了巨大成就。但是,由于不平衡数据和信息丢失等数据集所带来的挑战,在坦桑尼亚霍乱流行的建模中并未明确部署天气变量的集成和机器学习技术的使用。本文探索了使用机器学习技术来建模霍乱流行病的方法,该方法与季节性天气变化相关联,同时克服了数据不平衡问题。自适应合成采样方法(ADASYN)和主成分分析(PCA)用于恢复数据集的采样平衡和维数。此外,还使用敏感性,特异性和平衡精度指标来评估这七个模型的性能。根据Wilcoxon符号秩检验的结果和模型的特征,XGBoost分类器被选为该研究的最佳模型。总体结果使我们更好地了解了机器学习策略在医疗保健数据中的重要作用。但是,由于数据收集的偏差,该研究不能被视为时间序列问题。该研究建议对医疗保健系统进行审查,以促进高质量的数据收集和机器学习技术的部署。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号