首页> 外文会议>IEEE International Conference of Scalable and Smart Cloud;IEEE International Conference on Cyber Security and Cloud Computing >Evaluation of Combining Bootstrap with Multiple Imputation Using R on Knights Landing Platform
【24h】

Evaluation of Combining Bootstrap with Multiple Imputation Using R on Knights Landing Platform

机译:在Knights登陆平台上使用R评估Bootstrap与多重插补的组合

获取原文

摘要

Cloud computing and big data technologies are converging to offer a cost-effective delivery model for cloud-based big data analytics. Though impacts of size and scaling of big data on cloud have been extensively studied, the effects of complexity of underlying analytic methods on cloud performance have received less attention. This paper will develop and evaluate a computationally intensive statistical methodology to perform inference in the presence of both non-Gaussian data and missing data. Two well-established statistical approaches, bootstrap and multiple imputations (MI), will be combined to form the methodology. Bootstrap is a computer-based nonparametric resampling procedure that involves randomly selecting data many thousands of times to construct an empirical distribution, which is then used to construct confidence intervals for significance tests. This statistical technique enables scientists who conduct studies on data with known non-normality to obtain higher quality significance tests than is possible with a traditional asymptotic, normal-theory based significance test. However, the bootstrapping procedure only works when no data are missing or the data are missing completely at random (MCAR). Missing data can lead to biased estimates when the MCAR assumption is violated. It is unclear how to best implement a bootstrapping procedure in the presence of missing data. The proposed methods will provide guidelines and procedures that will enable researchers to use the technique in all areas of health, behavior and developmental science in which a study has missing data and cannot rely on parametric inference. Either bootstrapping or MI can be computationally expensive, and combining these two can lead to further computation costs in the cloud. Using carefully constructed simulation examples, we demonstrate that it is feasible to implement the proposed methodology in a high performance Knights Landing platform. However, the computation costs are substantial even with small data size. Further studies are needed to study the effects of optimizing the implementation and its performance with big data.
机译:云计算和大数据技术正在融合,从而为基于云的大数据分析提供经济高效的交付模型。尽管已经广泛研究了大数据的大小和规模对云的影响,但是底层分析方法的复杂性对云性能的影响却很少受到关注。本文将开发和评估一种计算密集型统计方法,以在非高斯数据和缺失数据均存在的情况下进行推理。两种行之有效的统计方法,引导程序和多重插补(MI),将结合起来形成方法论。 Bootstrap是基于计算机的非参数重采样过程,涉及数千次随机选择数据以构建经验分布,然后将其用于构建重要性检验的置信区间。这种统计技术使科学家能够对已知非正态性数据进行研究,从而获得比传统渐近,基于正态理论的显着性检验更高的质量显着性检验。但是,引导过程仅在没有数据丢失或数据完全随机丢失(MCAR)时起作用。违反MCAR假设时,数据丢失可能导致估计偏差。目前尚不清楚在丢失数据的情况下如何最好地实施引导过程。所提出的方法将提供指导方针和程序,使研究人员能够在研究,研究和研究缺少数据且不能依靠参数推论的所有领域中使用该技术。自举或MI都可能在计算上昂贵,而将这两者结合可能会导致云计算进一步增加。通过精心构建的仿真示例,我们证明了在高性能Knights Landing平台上实施所提出的方法是可行的。但是,即使数据量较小,计算成本也很大。需要进一步的研究来研究优化实施及其对大数据性能的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号