首页> 外文会议>IEEE Symposium Series on Computational Intelligence >Development of an Unsupervised Learning Methods for Classification of Accident Reports without Code Information

Development of an Unsupervised Learning Methods for Classification of Accident Reports without Code Information




For decades, if an accident occurs where a product may be at fault, the analysis of the accident has been extremely important to increase safety, and danger and harm investigations are conducted to address issues with the product and prevent future accidents. Therefore, many researchers have proposed collection and analysis methods for accident reports. Although most of the previous methods apply to numerical data, a large amount of text data exists in shape of accident reports which contains rich information, such as the events of each accident in detail. This is useful for classification of accident type, analysis of accident factor, and many other risk management analysis. Because of the scale of the data, in order to analyze it correctly, a framework and categorization scheme based on it are necessary. However, injury data tagging is usually done manually, and proves difficult for large databases of accident reports. To solve this issue, we proposed a classification method using text mining technology, Japanese grammar pattern and machine learning technology to automate the classification of injury data. The algorithm of our proposed method is vectorization of words and clustering. In this research, we used the product accident data with 260 injury data and 116 products in 970 accident reports provided by the National Institute of Technology and Evaluation (NITE). For the classification, we used five Mechanism codes based on the International Classification of External Causes of Injury (ICECI). For the clustering, the ratio of the correct classification for the Mechanism code based on ICECI of cluster 0, 1 and 2 exceeded 0.8, and we found many data can be classified into a type of Mechanism code of ICECI. We consider that the reason that the clusters with a high correct classification ratio was that it was able to extract the words necessary for the classification from the analysis reports. On the other hand, clusters with a low ratio of correct classification showed such a result because many words that are not necessary for classification were extracted and introduced noise for the word vector model.



  • 外文文献
  • 中文文献
  • 专利


京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号