Statistical modeling of large-scale simulation data

机译：大规模仿真数据的统计建模

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

With the advent of fast computer systems, scientists are now able to generate terabytes of simulation data. Unfortunately, the sheer size of these data sets has made efficient exploration of them impossible. To aid scientists in gleaning insight from their simulation data, we have developed an ad-hoc query infrastructure. Our system, called AQSim (short for Ad-hoc Queries for Simulation) reduces the data storage requirements and query access times in two stages. First, it creates and stores mathematical and statistical models of the data at multiple resolutions. Second, it evaluates queries on the models of the data instead of on the entire data set. In this paper, we present two simple but effective statistical modeling techniques for simulation data. Our first modeling technique computes the "true" (unbiased) mean of systematic partitions of the data. It makes no assumptions about the distribution of the data and uses a variant of the root mean square error to evaluate a model. Our second statistical modeling technique uses the Andersen-Darling goodness-of-fit method on systematic partitions of the data. This method evaluates a model by how well it passes the normality test on the data. Both of our statistical models effectively answer range queries. At each resolution of the data, we compute the precision of our answer to the user's query by scaling the one-sided Chebyshev Inequalities with the original mesh's topology. We combine precisions at different resolutions by calculating their weighted average. Our experimental evaluations on two scientific simulation data sets illustrate the value of using these statistical modeling techniques on multiple resolutions of large simulation data sets.

机译：随着快速计算机系统的出现，科学家现在已经能够生成TB级的模拟数据。不幸的是，这些数据集的庞大规模使其无法进行有效的探索。为了帮助科学家从他们的模拟数据中收集见解，我们开发了一个临时查询基础结构。我们的系统称为AQSim（模拟临时查询的缩写），它分两个阶段减少了数据存储需求和查询访问时间。首先，它以多种分辨率创建和存储数据的数学和统计模型。其次，它评估对数据模型的查询，而不是对整个数据集的查询。在本文中，我们提出了两种简单而有效的用于统计数据的统计建模技术。我们的第一种建模技术计算数据的系统分区的“真实”（无偏）均值。它不对数据的分布进行任何假设，并使用均方根误差的变体来评估模型。我们的第二种统计建模技术对数据的系统分区使用了Andersen-Darling拟合优度方法。该方法通过对数据进行正态性测试的程度来评估模型。我们的两个统计模型都可以有效地回答范围查询。在每种数据分辨率下，我们都通过用原始网格的拓扑缩放单侧Chebyshev不等式来计算对用户查询的答案的精度。我们通过计算加权平均值来组合不同分辨率下的精度。我们对两个科学模拟数据集的实验评估说明了在大型模拟数据集的多种分辨率上使用这些统计建模技术的价值。

著录项

来源
《Proceedings of the Eighth ACM SIGKDD international conference on knowledge discovery and data mining(KDD-2000)》|2002年|P.488-494|共7页
会议地点
作者
Tina Eliassi-Rad; Terence Critchlow; Ghaleb Abdulla;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;数据处理、数据处理系统;
关键词
statistical modeling;

机译：统计建模;

相似文献

外文文献
中文文献
专利

1. Statistical modelling and direct numerical simulations of decaying stably stratified turbulence. Part 2. Large-scale and small-scale anisotropy [J] . Godeferd FS., Staquet C. Journal of Fluid Mechanics . 2003,第0期

机译：衰减稳定分层湍流的统计模型和直接数值模拟。第2部分。大型和小型各向异性
2. The statistical analysis of multivariate failure time data: A marginal modeling approach , Ross L. Prentice , Shanshan Zhao , Boca Raton, FL : CRC Press . The statistical analysis of multivariate failure time data: A marginal modeling approach The statistical analysis of multivariate failure time data: A marginal modeling approach , Ross L. Prentice Ross L. Ross L. Prentice Prentice , Shanshan Zhao Shanshan Shanshan Zhao Zhao , Boca Raton, FL Boca Raton, FL : CRC Press CRC Press . [J] . Lin D. Y. Biometrics: Journal of the Biometric Society : An International Society Devoted to the Mathematical and Statistical Aspects of Biology . 2019,第4期

机译：多变量故障时间数据的统计分析：边缘建模方法，罗斯L. Prentice，山山赵，博卡拉顿，FL：CRC压力机。多元故障时间数据的统计分析：边缘建模方法多元故障时间数据的统计分析：边缘建模方法，罗斯L. Prentice Ross L. Ross L. Prentice Prentice，Shanshan Zhao Shanshan Shanshan Zhao Zhao，Boca Raton， FL BOCA RATON，FL：CRC按CRC压力机。
3. Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data [J] . Madhu Mazumdar, Jung-Yi Joyce Lin, Wei Zhang, BMC Health Services Research . 2020,第1期

机译：医疗成本数据统计和机器学习模型的比较：肿瘤护理模型（OCM）数据激励的仿真研究
4. Statistical Modeling of Large-Scale Simulation Data [C] . Tina Eliassi-Rad, Terence Critchlow, Ghaleb Abdulla Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul 23-26, 2002, Edmonton . 2002

机译：大规模仿真数据的统计建模
5. An Exploration of Statistical Modelling Methods on Simulation Data Case Study: Biomechanical Predator-Prey Simulations [D] . Seto, Christian. 2018

机译：仿真数据案例研究统计建模方法的探索：生物力学捕食者 - 猎物模拟
6. Comparison of statistical and machine learning models for healthcare cost data: a simulation study motivated by Oncology Care Model (OCM) data [O] . Madhu Mazumdar, Jung-Yi Joyce Lin, Wei Zhang, 2020

机译：用于医疗保健费用数据的统计模型和机器学习模型的比较：由肿瘤护理模型（OCM）数据驱动的模拟研究
7. Statistical Modeling of Large-Scale Simulation Data [O] . Tina Eliassi-rad, Terence Critchlow 2002

机译：大规模仿真数据的统计建模
8. Statistical Modeling of Large-Scale Simulation Data [R] . Eliassi-Rad, T., Critchlow, T., Abdulla, G. 2002

机译：大规模仿真数据的统计建模

Statistical modeling of large-scale simulation data

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅