【24h】

Statistical modeling of large-scale simulation data

机译:大规模仿真数据的统计建模

获取原文
获取外文期刊封面目录资料

摘要

With the advent of fast computer systems, scientists are now able to generate terabytes of simulation data. Unfortunately, the sheer size of these data sets has made efficient exploration of them impossible. To aid scientists in gleaning insight from their simulation data, we have developed an ad-hoc query infrastructure. Our system, called AQSim (short for Ad-hoc Queries for Simulation) reduces the data storage requirements and query access times in two stages. First, it creates and stores mathematical and statistical models of the data at multiple resolutions. Second, it evaluates queries on the models of the data instead of on the entire data set. In this paper, we present two simple but effective statistical modeling techniques for simulation data. Our first modeling technique computes the "true" (unbiased) mean of systematic partitions of the data. It makes no assumptions about the distribution of the data and uses a variant of the root mean square error to evaluate a model. Our second statistical modeling technique uses the Andersen-Darling goodness-of-fit method on systematic partitions of the data. This method evaluates a model by how well it passes the normality test on the data. Both of our statistical models effectively answer range queries. At each resolution of the data, we compute the precision of our answer to the user's query by scaling the one-sided Chebyshev Inequalities with the original mesh's topology. We combine precisions at different resolutions by calculating their weighted average. Our experimental evaluations on two scientific simulation data sets illustrate the value of using these statistical modeling techniques on multiple resolutions of large simulation data sets.
机译:随着快速计算机系统的出现,科学家现在已经能够生成TB级的模拟数据。不幸的是,这些数据集的庞大规模使其无法进行有效的探索。为了帮助科学家从他们的模拟数据中收集见解,我们开发了一个临时查询基础结构。我们的系统称为AQSim(模拟临时查询的缩写),它分两个阶段减少了数据存储需求和查询访问时间。首先,它以多种分辨率创建和存储数据的数学和统计模型。其次,它评估对数据模型的查询,而不是对整个数据集的查询。在本文中,我们提出了两种简单而有效的用于统计数据的统计建模技术。我们的第一种建模技术计算数据的系统分区的“真实”(无偏)均值。它不对数据的分布进行任何假设,并使用均方根误差的变体来评估模型。我们的第二种统计建模技术对数据的系统分区使用了Andersen-Darling拟合优度方法。该方法通过对数据进行正态性测试的程度来评估模型。我们的两个统计模型都可以有效地回答范围查询。在每种数据分辨率下,我们都通过用原始网格的拓扑缩放单侧Chebyshev不等式来计算对用户查询的答案的精度。我们通过计算加权平均值来组合不同分辨率下的精度。我们对两个科学模拟数据集的实验评估说明了在大型模拟数据集的多种分辨率上使用这些统计建模技术的价值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号