【24h】

Finding outliers in models of spatial data

机译:在空间数据模型中发现异常值

获取原文
获取原文并翻译 | 示例

摘要

Statistical models fit to data often require extensive and challenging re-estimation before achieving final form. For example, outliers can adversely affect fits. In other cases involving spatial data, a cluster may exist for which the model is incorrect, also adversely affecting the fit to the "good" data. In both cases, estimate residuals must be checked and rechecked until the data are cleaned and the appropriate model found. In this article, we demonstrate an algorithm that fits models to the largest subset of the data that is appropriate. Specifically, if a hypothesized linear regression model fits ninety percent of the data, our algorithm can not only find an excellent fit as if only that "good" data were presented, but will also highlight the ten percent of the "bad" data that is not fit. Our work in digital government has focused on mapping data. Thus we illustrate how models fit to census track data work, and how the data in the "bad" set can be viewed spatially through ArcView or other tools. This approach greatly simplifies the task of modeling spatial data, and makes us of advanced map visualization tools to understand the nature of subsets of the data for which the model is not appropriate.
机译:适应数据的统计模型通常需要进行广泛且具有挑战性的重新估算,才能获得最终形式。例如,异常值可能会对拟合产生不利影响。在涉及空间数据的其他情况下,可能存在模型不正确的聚类,这也不利地影响了对“良好”数据的拟合。在这两种情况下,必须检查并重新检查估计的残差,直到清除数据并找到合适的模型为止。在本文中,我们演示了一种适合模型的算法,适用于最大数据子集。具体来说,如果假设的线性回归模型拟合了90%的数据,我们的算法不仅可以找到一个很好的拟合,就好像只显示了“好”数据一样,还可以突出显示10%的“坏”数据。不合适。我们在数字政府中的工作重点是映射数据。因此,我们说明了模型如何适合人口普查跟踪数据的工作,以及如何通过ArcView或其他工具在空间上查看“不良”集中的数据。这种方法极大地简化了空间数据建模的任务,并使我们能够使用高级地图可视化工具来了解不适用于该模型的数据子集的性质。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号