Evaluation of Combining Bootstrap with Multiple Imputation Using R on Knights Landing Platform

机译：在Knights登陆平台上使用R评估Bootstrap与多重插补的组合

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Cloud computing and big data technologies are converging to offer a cost-effective delivery model for cloud-based big data analytics. Though impacts of size and scaling of big data on cloud have been extensively studied, the effects of complexity of underlying analytic methods on cloud performance have received less attention. This paper will develop and evaluate a computationally intensive statistical methodology to perform inference in the presence of both non-Gaussian data and missing data. Two well-established statistical approaches, bootstrap and multiple imputations (MI), will be combined to form the methodology. Bootstrap is a computer-based nonparametric resampling procedure that involves randomly selecting data many thousands of times to construct an empirical distribution, which is then used to construct confidence intervals for significance tests. This statistical technique enables scientists who conduct studies on data with known non-normality to obtain higher quality significance tests than is possible with a traditional asymptotic, normal-theory based significance test. However, the bootstrapping procedure only works when no data are missing or the data are missing completely at random (MCAR). Missing data can lead to biased estimates when the MCAR assumption is violated. It is unclear how to best implement a bootstrapping procedure in the presence of missing data. The proposed methods will provide guidelines and procedures that will enable researchers to use the technique in all areas of health, behavior and developmental science in which a study has missing data and cannot rely on parametric inference. Either bootstrapping or MI can be computationally expensive, and combining these two can lead to further computation costs in the cloud. Using carefully constructed simulation examples, we demonstrate that it is feasible to implement the proposed methodology in a high performance Knights Landing platform. However, the computation costs are substantial even with small data size. Further studies are needed to study the effects of optimizing the implementation and its performance with big data.

机译：云计算和大数据技术正在融合，从而为基于云的大数据分析提供经济高效的交付模型。尽管已经广泛研究了大数据的大小和规模对云的影响，但是底层分析方法的复杂性对云性能的影响却很少受到关注。本文将开发和评估一种计算密集型统计方法，以在非高斯数据和缺失数据均存在的情况下进行推理。两种行之有效的统计方法，引导程序和多重插补（MI），将结合起来形成方法论。 Bootstrap是基于计算机的非参数重采样过程，涉及数千次随机选择数据以构建经验分布，然后将其用于构建重要性检验的置信区间。这种统计技术使科学家能够对已知非正态性数据进行研究，从而获得比传统渐近，基于正态理论的显着性检验更高的质量显着性检验。但是，引导过程仅在没有数据丢失或数据完全随机丢失（MCAR）时起作用。违反MCAR假设时，数据丢失可能导致估计偏差。目前尚不清楚在丢失数据的情况下如何最好地实施引导过程。所提出的方法将提供指导方针和程序，使研究人员能够在研究，研究和研究缺少数据且不能依靠参数推论的所有领域中使用该技术。自举或MI都可能在计算上昂贵，而将这两者结合可能会导致云计算进一步增加。通过精心构建的仿真示例，我们证明了在高性能Knights Landing平台上实施所提出的方法是可行的。但是，即使数据量较小，计算成本也很大。需要进一步的研究来研究优化实施及其对大数据性能的影响。

著录项

来源
《IEEE International Conference of Scalable and Smart Cloud;IEEE International Conference on Cyber Security and Cloud Computing》|2017年|14-17|共4页
会议地点
作者
Chuan Zhou; Yuxiang Gao; Waylon Howard;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Cloud computing; Servers; Big Data; Computational modeling; Sociology; Statistics;

机译：云计算;服务器;大数据;计算建模;社会学;统计;

相似文献

外文文献
中文文献
专利

1. Combining multiple imputation and bootstrap in the analysis of cost-effectiveness trial data [J] . Brand Jaap, van Buuren Stef, le Cessie Saskia, Statistics in medicine . 2019,第2期

机译：在成本效益试验数据分析中结合多重估算和自举
2. Combining kNN Imputation and Bootstrap Calibrated Empirical Likelihood for Incomplete Data Analysis [J] . Yongsong Qin, Shichao Zhang, Chengqi Zhang International Journal of Data Warehousing and Mining . 2010,第4期

机译：结合kNN归因和Bootstrap校准的经验可能性进行不完整数据分析
3. Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns [J] . Silva-Ramireza Esther-Lydia, Pino-Mejias Rafael, Lopez-Coello Manuel Applied Soft Computing . 2015,第Null期

机译：带有多层感知器的单插补和结合多层感知器和k近邻的多重插补的单调模式
4. Evaluation of Combining Bootstrap with Multiple Imputation Using R on Knights Landing Platform [C] . Chuan Zhou, Yuxiang Gao, Waylon Howard IEEE International Conference on Cyber Security and Cloud Computing . 2017

机译：使用R在骑士登陆平台上使用多重估算的自由释放
5. An Analysis of Variation Between Cores for Intel Xeon Phi Knights Corner and Xeon Phi Knights Landing. [D] . Robinson, Jamar. 2017

机译：英特尔至强披披骑士角和至强披披骑士登陆的内核之间的差异分析。
6. Combining multiple imputation and bootstrap in the analysis of cost‐effectiveness trial data [O] . Jaap Brand, Stef van Buuren, Saskia le Cessie, -1

机译：在成本效益试验数据分析中结合多种估算和引导
7. Combining multiple imputation and bootstrap in the analysis of cost‐effectiveness trial data [O] . Jaap Brand, Stef Buuren, Saskia Cessie, 2018

机译：在成本效益试验数据分析中结合多重归纳和自举
8. Mobile Landing Platform with Core Capability Set (MLP w/CCS): Combined Initial Operational Test and Evaluation and Live Fire Test and Evaluation Report. [R] . 2015

机译：具有核心能力集的移动着陆平台（mLp w / CCs）：联合初始运行测试和评估以及实时火灾测试和评估报告。

Evaluation of Combining Bootstrap with Multiple Imputation Using R on Knights Landing Platform

摘要

著录项

相似文献

相关主题

期刊订阅