首页> 外文期刊>Accident Analysis and Prevention >Ensemble-based model selection for imbalanced data to investigate the contributing factors to multiple fatality road crashes in Ghana
【24h】

Ensemble-based model selection for imbalanced data to investigate the contributing factors to multiple fatality road crashes in Ghana

机译:基于集合的模型选择,用于对加纳的多个死亡道路崩溃的贡献因素进行研究

获取原文
获取原文并翻译 | 示例
           

摘要

The study aims to identify relevant variables to improve the prediction performance of the crash injury severity (CIS) classification model. Unfortunately, the CIS database is invariably characterized by the class imbalance. For instance, the samples of multiple fatal injury (MFI) severity class are typically rare as opposed to other classes. The imbalance phenomenon may introduce a prediction bias in favour of the majority class and affect the quality of the learning algorithm. The paper proposes an ensemble-based variable ranking scheme that incorporates the data resampling. At the data pre-processing level, majority weighted minority oversampling (MWMOTE) is employed to treat the imbalanced training data. Ensemble of classifiers induced from the balanced data is used to evaluate and rank the individual variables according to their importance to the injury severity prediction. The relevant variables selected are then applied to the balanced data to form a training set for the CIS classification modelling. An empirical comparison is conducted through considering the variable ranking by: 1) the learning of single inductive algorithm with imbalanced data where the relevant variables are applied to the imbalanced data to form the training data; 2) the learning of single inductive algorithm with MWMOTE data and the relevant variables identified are applied to the balanced data to form the training data; and 3) the learning of ensembles with imbalanced data where the relevant variables identified are applied to the imbalanced data to form the training data. Bayesian Networks (BNs) classifiers are then developed for each ranking method, where nested subsets of the top ranked variables are adopted. The model predictions are captured in four performance indicators in the comparative study. Based on three-year (2014-2016) crash data in Ghana, the empirical results show that the proposed method is effective to identify the most prolific predictors of the CIS level. Finally, based on the inference results of BNs developed on the best subset, the study offers the most probable explanations to the occurrence of MFI crashes in Ghana.
机译:该研究旨在识别相关变量以改善碰撞损伤严重程度(CIS)分类模型的预测性能。不幸的是,CIS数据库总是由班级的不平衡特征。例如,多次致命伤害(MFI)严重等级的样本通常稀有,而不是其他类。不平衡现象可以引入有利于多数类的预测偏压,并影响学习算法的质量。本文提出了一种基于组合的可变排名方案,其包含数据重采样。在数据预处理水平,多数加权少数群体过采样(MWMOTE)用于治疗不平衡的培训数据。根据平衡数据引起的分类器的集合用于评估并根据其对伤害严重性预测的重要性来评估和排列单个变量。然后将所选择的相关变量应用于平衡数据以形成用于CIS分类建模的训练集。通过考虑变量排名来进行经验比较:1)使用相对数据的单一感应算法的学习,其中相关变量应用于不平衡数据以形成培训数据; 2)使用MWMote数据和所识别的相关变量的单一感应算法的学习应用于平衡数据以形成培训数据; 3)使用具有不平衡数据的合奏的学习,其中识别的相关变量应用于不平衡数据以形成培训数据。然后为每个排名方法开发贝叶斯网络(BNS)分类器,其中采用顶部排名变量的嵌套子集。在比较研究中的四个性能指标中捕获了模型预测。基于三年(2014-2016)加纳的崩溃数据,实证结果表明,该方法有效地识别CIS级别的最多产预测因子。最后,基于BNS的推理结果在最佳子集上开发,该研究提供了对加纳的MFI崩溃发生的最可能解释。

著录项

  • 来源
    《Accident Analysis and Prevention》 |2021年第3期|105851.1-105851.12|共12页
  • 作者单位

    Southwest Jiaotong Univ Sch Transportat & Logist West Pk Chengdu 611756 Peoples R China|Natl Engn Lab Integrated Transportat Big Data App West Pk Chengdu 611756 Peoples R China;

    Tsinghua Univ Dept Civil Engn Suite 217 Heshangheng Bldg Beijing 100084 Peoples R China;

    Southwest Jiaotong Univ Sch Transportat & Logist West Pk Chengdu 611756 Peoples R China|Natl Engn Lab Integrated Transportat Big Data App West Pk Chengdu 611756 Peoples R China;

    Karare Univ Dept Informat Technol Omdurman 12304 Sudan;

    Univ Nairobi Dept Civil & Construct Engn Nairobi 30197 Kenya;

    Guangzhou Transportat Planning Inst Guangzhou 510030 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Multiple fatal injury crash; Classification; Model selection; Class imbalance; Oversampling; Ensemble classifiers;

    机译:多次致命伤害崩溃;分类;模型选择;类别不平衡;过采样;合奏分类器;
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号