【24h】

Statistical analysis of DMV crash data

机译:DMV崩溃数据的统计分析

获取原文

摘要

The purpose of this paper is to present statistical methods and models we used to find out factors that caused fatal car crashes and high damage cost. The benefit of our project is that the Virginia DMV can make some adjustments accordingly and reduce the number of crashes that are fatal and have high damage cost. The data we used is between 2010 and 2014 for both fatality analysis and damage cost analysis. Data of 2015 was used for fatality analysis only. In the first part of this paper, we will introduce how we find factors that caused fatal car crashes. Since the data are unbalanced, we first subsampled the non-fatal crashes and applied a higher weight for fatal crashes. When building the model, we used logistic regression model to predict whether an accident is fatal or not. To select features that are more important, we used factors that are all numeric and with correlation value more than 0.1. We obtained a recall of 40% in the prediction from the logistic regression. We also adopted Decision Tree in fatality analysis and built two models for 2010???2014 data as well as 2015 data. In the second part of this paper, we will discuss how we find factors that caused damage cost. Since values of damage cost variable are unbalanced, we proposed a two-state method to find critical factors of the damage cost. First, we used K nearest neighborhood (KNN) to predict whether the damage cost is 0 or not. Second, we built Lasso Regression on the data where the damage cost were not zero and discovered the factors that lead to the damage cost.
机译:本文的目的是呈现我们曾经发现导致致命车祸和高损害成本的因素的统计方法和模型。我们项目的好处是,弗吉尼亚DMV可以相应地进行一些调整,并减少致命的崩溃的数量,并且具有高损坏成本。我们使用的数据是2010年和2014年之间,用于死亡分析和损害成本分析。 2015年的数据仅用于死亡分析。在本文的第一部分,我们将介绍我们如何找到导致致命车辆崩溃的因素。由于数据不平衡,我们首先将非致命撞击撞击并施加更高的致命碰撞。在构建模型时,我们使用了Logistic回归模型来预测事故是致命的。要选择更重要的功能,我们使用了所有数字的因素,并且相关值超过0.1。我们在逻辑回归中的预测中获得了40%的召回。我们还通过了死亡分析中的决策树,并为2010年建立了两个模型2014年数据以及2015年数据。在本文的第二部分,我们将讨论如何发现导致损害成本的因素。由于损坏成本变量的值不平衡,因此我们提出了一种两种方法来寻找损坏成本的关键因素。首先,我们使用K最近的邻居(KNN)来预测损坏成本是0。其次,我们在损坏成本不是零的数据上建立了套索回归,并发现了导致损坏成本的因素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号