首页> 外文会议>IEEE Symposium Series on Computational Intelligence >Development of an Unsupervised Learning Methods for Classification of Accident Reports without Code Information
【24h】

Development of an Unsupervised Learning Methods for Classification of Accident Reports without Code Information

机译:无代码信息的事故报告分类的无监督学习方法的发展

获取原文

摘要

For decades, if an accident occurs where a product may be at fault, the analysis of the accident has been extremely important to increase safety, and danger and harm investigations are conducted to address issues with the product and prevent future accidents. Therefore, many researchers have proposed collection and analysis methods for accident reports. Although most of the previous methods apply to numerical data, a large amount of text data exists in shape of accident reports which contains rich information, such as the events of each accident in detail. This is useful for classification of accident type, analysis of accident factor, and many other risk management analysis. Because of the scale of the data, in order to analyze it correctly, a framework and categorization scheme based on it are necessary. However, injury data tagging is usually done manually, and proves difficult for large databases of accident reports. To solve this issue, we proposed a classification method using text mining technology, Japanese grammar pattern and machine learning technology to automate the classification of injury data. The algorithm of our proposed method is vectorization of words and clustering. In this research, we used the product accident data with 260 injury data and 116 products in 970 accident reports provided by the National Institute of Technology and Evaluation (NITE). For the classification, we used five Mechanism codes based on the International Classification of External Causes of Injury (ICECI). For the clustering, the ratio of the correct classification for the Mechanism code based on ICECI of cluster 0, 1 and 2 exceeded 0.8, and we found many data can be classified into a type of Mechanism code of ICECI. We consider that the reason that the clusters with a high correct classification ratio was that it was able to extract the words necessary for the classification from the analysis reports. On the other hand, clusters with a low ratio of correct classification showed such a result because many words that are not necessary for classification were extracted and introduced noise for the word vector model.
机译:几十年来,如果发生了可能导致产品故障的事故,那么对事故的分析对于提高安全性就变得极为重要,并且进行了危害和危害调查以解决产品问题并防止将来发生事故。因此,许多研究人员提出了事故报告的收集和分析方法。尽管以前的大多数方法都适用于数字数据,但是大量的文本数据以事故报告的形式存在,其中包含丰富的信息,例如详细的每次事故事件。这对于事故类型的分类,事故因素的分析以及许多其他风险管理分析很有用。由于数据的规模,为了正确分析数据,有必要建立一个基于数据的框架和分类方案。但是,伤害数据标记通常是手动完成的,事实证明,对于大型事故报告数据库而言,这是困难的。为了解决这个问题,我们提出了一种使用文本挖掘技术,日语语法模式和机器学习技术的分类方法来自动对伤害数据进行分类。我们提出的方法的算法是单词的向量化和聚类。在这项研究中,我们使用了由美国国家技术与评估研究所(NITE)提供的970个事故报告中的260个伤害数据和116个产品的产品事故数据。对于分类,我们使用了五种基于国际外部伤害原因分类(ICECI)的机制代码。对于聚类,基于簇0,簇1和簇2的ICECI的机械代码正确分类的比率超过0.8,我们发现许多数据可以归类为ICECI的机械代码类型。我们认为正确分类率高的聚类的原因是它能够从分析报告中提取分类所需的单词。另一方面,正确分类率低的聚类显示出这样的结果,这是因为提取了许多不需要分类的词并将其引入了词向量模型的噪声。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号