首页> 外文会议>ASCE international workshop on computing in civil engineering >An Ensemble Approach for Classification of Accident Narratives
【24h】

An Ensemble Approach for Classification of Accident Narratives

机译:一种事故叙事分类的合奏方法

获取原文

摘要

There is an increased interest in using text mining techniques to automatically classify text-based accident descriptions in industries such as aviation, medical and construction. The ability to automatically classify accident narratives enables large text databases to be analyzed to provide insights on accidents and near misses. Several machine learning and text mining approaches such as support vector machines (SVM), naive Bayes, and neural networks have been adopted in the literature in order to classify accident narratives. In the recent years, ensemble approach has gained popularity among machine learning applications due to the fact that, ensemble approach combines multiple machine learning algorithms into a much stronger learning algorithm yielding better results. Thus, this study evaluates the effectiveness of an ensemble approach which often performs better than a single learning algorithm, using popular machine learning algorithms: support vector machine, decision tree, linear regression, k nearest neighbor, naive Bayes and neural network. Analyzing the accident narratives reported in the construction safety data yields useful knowledge that can be used to improve the understanding of what went wrong in the past and take necessary precautionary measures to prevent future accidents. One thousand accident narratives obtained from the US OSHA website are used to facilitate this study. Uni-gram tokenization, tf-idf document term matrix representation along with 11 class labels are used to perform the research approach. The precision of ensemble model ranged from 0.6 to 1.0; recall ranged from 0.18 to 0.96, and Fl score was 0.14 to 0.96 across the 11 labels of accident types. The highest average Fl score reported herein was 0.69.
机译:使用文本挖掘技术有兴趣增加,以自动对航空,医疗和建筑等行业进行分类的基于文本的事故描述。自动分类事故叙述的能力使分析了大型文本数据库,以提供关于事故和近未命中的见解。文献中采用了几种机器学习和文本挖掘方法,如支持向量机(SVM),天真贝叶斯和神经网络,以分类事故叙述。近年来,由于集成方法将多个机器学习算法结合到更好的学习算法,因此,集合方法在机器学习应用中取得了普及。因此,本研究评估了一项合并方法的有效性,该方法通常比单一学习算法更好地执行,使用流行的机器学习算法:支持向量机,决策树,线性回归,K最近邻居,天真贝叶斯和神经网络。分析建设安全数据中报告的事故叙述产生了有用的知识,可用于改善过去出现问题的理解,并采取必要的预防措施来防止未来的事故。从美国OSHA网站获得的一千名意外叙述用于促进这项研究。 UNI-GRAM标记化,TF-IDF文档术语矩阵表示以及11类标签用于执行研究方法。合奏模型的精度范围为0.6至1.0;召回范围为0.18至0.96,流体分数为0.14至0.96,在11种事故类型的标签上。本文报道的最高平均流量分数为0.69。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号