首页> 外文期刊>Safety science >Predicting and analyzing injury severity: A machine learning-based approach using class-imbalanced proactive and reactive data
【24h】

Predicting and analyzing injury severity: A machine learning-based approach using class-imbalanced proactive and reactive data

机译:预测和分析伤害严重性:基于机器学习的方法,使用类 - 不平衡主动和反应数据

获取原文
获取原文并翻译 | 示例
       

摘要

Although the utility of the machine learning (ML) techniques is established in occupational accident domain using reactive data, its exploration in predicting injury severity using both reactive and proactive data is new. This necessitates the investigation of the significance of both types of data in prediction of injury severity using ML techniques. In addition, the unstructured texts, and class-imbalance in data often create difficulty in analysis. Therefore, to address the above-mentioned issues, two types of data, namely investigation report (i.e., reactive data) and inspection report (i.e., proactive data), collected from a steel plant, are used in this study. The datasets are merged together for generating mixed dataset. Topic modeling is used to handle the unstructured texts. A total of four oversampling algorithms, namely Synthetic Minority Over-sampling Technique (SMOTE), borderline SMOTE (BLSMOTE), Majority Weighted Minority Oversampling Technique (MWMOTE), and k-means SMOTE (KMSMOTE) have been used separately to handle the class imbalance issue. Thereafter, a set of six prediction algorithms, namely support vector machine, artificial neural network, Naive Bayes, k-nearest neighbour, classification and regression tree analysis, and random forest have been used on reactive and mixed datasets separately for injury severity prediction. The results reveal that KMSMOTE performs better than others in balancing datasets and therefore, helps in achieving higher prediction in terms of average recall, F1-score and geometric mean. In addition, it is also statistically shown that prediction of injury severity is significantly higher using mixed dataset than reactive dataset only. Finally, a set of 19 crisp safety decision rules are generated using tolerance rough set approach (TRSA), which can explain the factors responsible for injury severity outcomes, namely 'Fatal', 'Medical case', and 'First-aid'.
机译:尽管使用反应数据在职业事故领域建立了机器学习(ML)技术的效用,但是在使用反应性和主动数据的预测严重程度预测伤害严重程度的探索是新的。这需要使用ML技术对预测损伤严重程度预测的两种数据的意义调查。此外,非结构化文本和数据中的类别不平衡经常在分析中创造难度。因此,为了解决上述问题,在本研究中使用了两种数据,即从钢铁厂收集的调查报告(即,反应数据)和检查报告(即,主动数据),在本研究中使用。数据集合并在一起以生成混合数据集。主题建模用于处理非结构化文本。共有四种过采样算法,即合成少数群体过度采样技术(SMOTE),边界麦克风(BLSMOTE),多数加权少数群体过采样技术(MWMOTE)和K-Means Smote(KMSMote)已被单独使用,以处理类别不平衡问题。此后,一组六个预测算法,即支持向量机,人工神经网络,天真凸户,K最近邻居,分类和回归树分析,以及随机森林分别用于伤害严重程度预测。结果表明,KMSMOTE在平衡数据集中的其他人表现得更好,因此有助于在平均召回,F1分数和几何平均值方面实现更高的预测。另外,在统计上表明,使用混合数据集仅比仅比反应性数据集的混合数据集预测严重程度显着更高。最后,使用公差粗糙集方法(TRSA)产生了一组19个清晰的安全决策规则,可以解释负责伤害严重程度结果的因素,即“致命”,“医学案”和“急救”。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号