首页> 外文会议>IEEE Symposium Series on Computational Intelligence >Development of an Unsupervised Learning Methods for Classification of Accident Reports without Code Information
【24h】

Development of an Unsupervised Learning Methods for Classification of Accident Reports without Code Information

机译:制定无需代码信息的事故报告分类的无监督学习方法

获取原文
获取外文期刊封面目录资料

摘要

For decades, if an accident occurs where a product may be at fault, the analysis of the accident has been extremely important to increase safety, and danger and harm investigations are conducted to address issues with the product and prevent future accidents. Therefore, many researchers have proposed collection and analysis methods for accident reports. Although most of the previous methods apply to numerical data, a large amount of text data exists in shape of accident reports which contains rich information, such as the events of each accident in detail. This is useful for classification of accident type, analysis of accident factor, and many other risk management analysis. Because of the scale of the data, in order to analyze it correctly, a framework and categorization scheme based on it are necessary. However, injury data tagging is usually done manually, and proves difficult for large databases of accident reports. To solve this issue, we proposed a classification method using text mining technology, Japanese grammar pattern and machine learning technology to automate the classification of injury data. The algorithm of our proposed method is vectorization of words and clustering. In this research, we used the product accident data with 260 injury data and 116 products in 970 accident reports provided by the National Institute of Technology and Evaluation (NITE). For the classification, we used five Mechanism codes based on the International Classification of External Causes of Injury (ICECI). For the clustering, the ratio of the correct classification for the Mechanism code based on ICECI of cluster 0, 1 and 2 exceeded 0.8, and we found many data can be classified into a type of Mechanism code of ICECI. We consider that the reason that the clusters with a high correct classification ratio was that it was able to extract the words necessary for the classification from the analysis reports. On the other hand, clusters with a low ratio of correct classification showed such a result because many words that are not necessary for classification were extracted and introduced noise for the word vector model.
机译:几十年来,如果出现产品可能存在故障的情况下,事故的分析对于提高安全性极为重要,而且危险和危害调查是为了解决产品的问题,防止未来的事故。因此,许多研究人员已经提出了事故报告的收集和分析方法。尽管以前的大多数方法适用于数值数据,但是大量的文本数据存在于事故报告的形状中,其中包含丰富的信息,例如每次事故的事件。这对于事故类型的分类,事故因素分析以及许多其他风险管理分析是有用的。由于数据的规模,为了正确分析它,基于它是必要的框架和分类方案。但是,通常手动完成伤害数据标记,并证明大型事故报告数据库难以。为了解决这个问题,我们提出了一种使用文本挖掘技术的分类方法,日本语法模式和机器学习技术,自动化伤害数据的分类。我们所提出的方法的算法是单词和聚类的矢量化。在这项研究中,我们使用国家工程学研究所和评估研究所(NITE)提供的970例事故报告中具有260名伤害数据的产品事故数据和116种产品。对于分类,我们使用基于国际伤害的外部原因分类(ICECI)的国际分类。对于群集,基于群集0,1和2的ICECI的机制代码的正确分类比率超过0.8,我们发现许多数据可以分类为ICeci的一种机制代码。我们认为,具有高正确分类率的群集的原因是它能够从分析报告中提取分类所需的词语。另一方面,具有正确分类比例的簇显示出这样的结果,因为提取了许多不需要对分类的词语并引入了单词矢量模型的噪声。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号