首页> 外文期刊>Transportation research >A novel variable selection method based on frequent pattern tree for real-time traffic accident risk prediction
【24h】

A novel variable selection method based on frequent pattern tree for real-time traffic accident risk prediction

机译:一种基于频繁模式树的交通事故实时风险预测的变量选择方法

获取原文
获取原文并翻译 | 示例
           

摘要

With the availability of large volumes of real-time traffic flow data along with traffic accident information, there is a renewed interest in the development of models for the real-time prediction of traffic accident risk. One challenge, however, is that the available data are usually complex, noisy, and even misleading. This raises the question of how to select the most important explanatory variables to achieve an acceptable level of accuracy for real-time traffic accident risk prediction. To address this, the present paper proposes a novel Frequent Pattern tree (FP tree) based variable selection method. The method works by first identifying all the frequent patterns in the traffic accident dataset Next, for each frequent pattern, we introduce a new metric, herein referred to as the Relative Object Purity Ratio (ROPR). The ROPR is then used to calculate the importance score of each explanatory variable which in turn can be used for ranking and selecting the variables that contribute most to explaining the accident patterns. To demonstrate the advantages of the proposed variable selection method, the study develops two traffic accident risk prediction models, based on accident data collected on interstate highway 1-64 in Virginia, namely a k-nearest neighbor model and a Bayesian network. Prior to model development, two variable selection methods are utilized: (1) the FP tree based method proposed in this paper; and (2) the random forest method, a widely used variable selection method, which is used as the base case for comparison. The results show that the FP tree based accident risk prediction models perform better than the random forest based models, regardless of the type of prediction models (i.e. k-nearest neighbor or Bayesian network), the settings of their parameters, and the types of datasets used for model training and testing. The best model found is a FP tree based Bayesian network model that can predict 61.11% of accidents while having a false alarm rate of 38.16%. These results compare very favorably with other accident prediction models reported in the literature. (C) 2015 Elsevier Ltd. All rights reserved.
机译:随着大量实时交通流数据以及交通事故信息的可用性,人们对开发实时预测交通事故风险的模型有了新的兴趣。然而,一个挑战是,可用数据通常是复杂的,嘈杂的,甚至是误导的。这就提出了一个问题,即如何选择最重要的解释变量以达到可接受的实时交通事故风险预测的准确性水平。为了解决这个问题,本文提出了一种新颖的基于频繁模式树(FP tree)的变量选择方法。该方法的工作方式是首先识别交通事故数据集中的所有频繁模式。接下来,对于每个频繁模式,我们引入一个新的指标,在​​本文中称为相对对象纯度比(ROPR)。然后,将ROPR用于计算每个解释变量的重要性得分,而重要性得分又可以用于对最有助于解释事故模式的变量进行排名和选择。为了证明所提出的变量选择方法的优势,该研究基于在弗吉尼亚州1-64号州际高速公路上收集的事故数据,开发了两个交通事故风险预测模型,即k最近邻居模型和贝叶斯网络。在模型开发之前,使用了两种变量选择方法:(1)本文提出的基于FP树的方法; (2)随机森林法,一种广泛使用的变量选择方法,用作比较的基础案例。结果表明,无论预测模型的类型(即k最近邻或贝叶斯网络),参数的设置以及数据集的类型如何,基于FP树的事故风险预测模型的性能均优于基于随机森林的模型。用于模型训练和测试。发现的最佳模型是基于FP树的贝叶斯网络模型,该模型可以预测61.11%的事故,而误报率为38.16%。这些结果与文献中报道的其他事故预测模型相比非常有利。 (C)2015 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号