...
首页> 外文期刊>Wiley interdisciplinary reviews. Computational statistics >Bootstraps, permutation tests, and sampling orders of magnitude faster using SAS~?
【24h】

Bootstraps, permutation tests, and sampling orders of magnitude faster using SAS~?

机译:使用SAS〜?引导程序,置换测试和采样数量级更快

获取原文
获取原文并翻译 | 示例
           

摘要

While permutation tests and bootstraps have very wide-ranging application, both share a common potential drawback: as data-intensive resampling methods, both can be runtime prohibitive when applied to large- or even medium-sized data samples drawn from large datasets. The data explosion over the past few decades hasmade this acommonoccurrence, and it highlights the increasing need for faster, and more efficient and scalable, permutation test and bootstrap algorithms. Seven bootstrap and six permutation test algorithms coded in SAS (the largest privately owned software firm globally) are compared herein. The fastest algorithms (‘OPDY’ for the bootstrap, ‘OPDN’ for permutation tests) are new, use no modules beyond Base SAS, and achieve speed increases orders of magnitude faster than the relevant ‘built-in’ SAS procedures (OPDY is over 200×faster than Proc SurveySelect;OPDN is over 240× faster than Proc SurveySelect, over 350× faster than NPAR1WAY (which crashes on datasets less than a 10th the size OPDN can handle), and over 720× faster than Proc Multtest). OPDY also is much faster than hashing, which crashes on datasets smaller—sometimes by orders of magnitude—than OPDY can handle. OPDY is easily generalizable to multivariate regression models, and OPDN, which uses an extremely efficient draw-by-draw random-samplingwithout-replacement algorithm, can use virtually any permutation statistic, so both have a very wide range of application. And the time complexity of both OPDY and OPDN is sublinear, making them not only the fastest but also the only truly scalable bootstrap and permutation test algorithms, respectively, in SAS.
机译:尽管置换测试和引导程序的应用范围非常广泛,但两者都有一个共同的潜在缺陷:作为数据密集型重采样方法,当应用于从大型数据集获取的大或中型数据样本时,这两种方法在运行时都是禁止的。在过去的几十年中,数据的爆炸式增长使这种现象非常普遍,并且突显了对更快,更有效和可扩展的置换测试和自举算法的日益增长的需求。本文比较了SAS(全球最大的私有软件公司)中编码的七个引导程序和六个排列测试算法。最快的算法(引导程序为“ OPDY”,置换测试为“ OPDN”)是新算法,除了基本SAS之外不使用任何模块,并且与相关的“内置” SAS程序相比,速度提高了几个数量级(OPDY已结束)比Proc SurveySelect快200倍; OPDN比Proc SurveySelect快240倍以上,比NPAR1WAY快350倍以上(NPAR1WAY崩溃(在小于OPDN可处理大小的十分之一的数据集上崩溃),并且比Proc Multtest快720倍以上)。 OPDY的速度也比哈希快得多,因为哈希在小于OPDY处理能力的数据集上崩溃(有时数量级降低了数个数量级)。 OPDY可以很容易地推广到多元回归模型,OPDN使用非常高效的逐次绘制随机抽样而无需替换的算法,几乎可以使用任何排列统计信息,因此两者都有非常广泛的应用。 OPDY和OPDN的时间复杂度是次线性的,这使它们不仅是SAS中最快的,而且是唯一真正可扩展的自举和置换测试算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号