首页> 外文会议>IEEE International Conference on Cyber Security and Cloud Computing >Evaluation of Combining Bootstrap with Multiple Imputation Using R on Knights Landing Platform
【24h】

Evaluation of Combining Bootstrap with Multiple Imputation Using R on Knights Landing Platform

机译:使用R在骑士登陆平台上使用多重估算的自由释放

获取原文

摘要

Cloud computing and big data technologies are converging to offer a cost-effective delivery model for cloud-based big data analytics. Though impacts of size and scaling of big data on cloud have been extensively studied, the effects of complexity of underlying analytic methods on cloud performance have received less attention. This paper will develop and evaluate a computationally intensive statistical methodology to perform inference in the presence of both non-Gaussian data and missing data. Two well-established statistical approaches, bootstrap and multiple imputations (MI), will be combined to form the methodology. Bootstrap is a computer-based nonparametric resampling procedure that involves randomly selecting data many thousands of times to construct an empirical distribution, which is then used to construct confidence intervals for significance tests. This statistical technique enables scientists who conduct studies on data with known non-normality to obtain higher quality significance tests than is possible with a traditional asymptotic, normal-theory based significance test. However, the bootstrapping procedure only works when no data are missing or the data are missing completely at random (MCAR). Missing data can lead to biased estimates when the MCAR assumption is violated. It is unclear how to best implement a bootstrapping procedure in the presence of missing data. The proposed methods will provide guidelines and procedures that will enable researchers to use the technique in all areas of health, behavior and developmental science in which a study has missing data and cannot rely on parametric inference. Either bootstrapping or MI can be computationally expensive, and combining these two can lead to further computation costs in the cloud. Using carefully constructed simulation examples, we demonstrate that it is feasible to implement the proposed methodology in a high performance Knights Landing platform. However, the computation costs are substantial even with small data size. Further studies are needed to study the effects of optimizing the implementation and its performance with big data.
机译:云计算和大数据技术正在融合,为基于云的大数据分析提供成本有效的交付模型。虽然已经广泛研究了大量数据的大小和缩放的影响,但是已经广泛研究了云层对云性能的潜在分析方法的复杂性的影响。本文将开发和评估计算密集型统计方法,以在存在非高斯数据和缺少数据的情况下执行推断。将组合两个熟悉的统计方法,引导和多避雷(MI)以形成方法。 Bootstrap是一种基于计算机的非参数重采样过程,涉及随机选择数千次的数据来构建经验分布,然后用于构建重要性测试的置信区间。这种统计技术使科学家能够对具有已知非正常性的数据进行研究,以获得比传统的渐近正常理论的重要性测试能够获得更高的质量意义测试。但是,启动过程仅在缺少数据时仅适用,或者在随机丢失数据(MCAR)。缺少数据可能导致违反MCAR假设时偏置估计。目前尚不清楚如何在存在缺失数据的情况下最佳实现自动启动过程。该拟议的方法将提供指导和程序,使研究人员能够在研究中使用该技术,其中一项研究缺失数据,不能依赖参数推断。 Bootstraping或MI可以计算地昂贵,并且组合这两个可以导致云中进一步的计算成本。使用仔细构造的仿真示例,我们证明在高性能骑士着陆平台中实现所提出的方法是可行的。然而,即使数据大小小,计算成本也很大。需要进一步的研究来研究优化实现及其性能与大数据的影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号