Robust, Scalable, and Fast Bootstrap Method for Analyzing Large Scale Data

Basiri Shahab; Ollila Esa; Koivunen Visa

首页> 外文期刊>Signal Processing, IEEE Transactions on >Robust, Scalable, and Fast Bootstrap Method for Analyzing Large Scale Data

【24h】

Robust, Scalable, and Fast Bootstrap Method for Analyzing Large Scale Data

机译：强大，可扩展且快速的Bootstrap方法来分析大规模数据

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we address the problem of performing statistical inference for large scale data sets i.e., Big Data. The volume and dimensionality of the data may be so high that it cannot be processed or stored in a single computing node. We propose a scalable, statistically robust and computationally efficient bootstrap method, compatible with distributed processing and storage systems. Bootstrap resamples are constructed with smaller number of distinct data points on multiple disjoint subsets of data, similarly to the bag of little bootstrap method (BLB) [A. Kleiner, A. Talwalkar, P. Sarkar, and M. I. Jordan, “A scalable bootstrap for massive data,” J. Roy. Statist. Soc.: Ser. B (Statist. Methodol.), vol. 76, no. 4, pp. 795–816, 2014]. The disjoint subsets are significantly smaller than the original full data set and they may be processed in different storage and computing units in parallel. Then significant savings in computation is achieved by avoiding the recomputation of the estimator for each bootstrap sample. Instead, a computationally efficient fixed-point estimation equation is analytically solved via a smart approximation following the Fast and Robust Bootstrap method (FRB) [M. Salibián-Barrera, S. Van Aelst, and G. Willems, “Fast and robust bootstrap,” Statist. Methods Appl., vol. 17, no. 1, pp. 41–71, 2008]. Our proposed bootstrap method facilitates the use of highly robust statistical methods in analyzing large scale data sets. The favorable statistical properties of the method are established analytically. Numerical examples demonstrate scalability, low complexity and robust statistical performance of the method in analyzing large data sets.

机译：在本文中，我们解决了对大型数据集（即大数据）执行统计推断的问题。数据的数量和维数可能很高，以致无法在单个计算节点中进行处理或存储。我们提出了一种可扩展的，统计上可靠且计算效率高的引导程序方法，该方法与分布式处理和存储系统兼容。引导程序重采样在多个不相交的数据子集上使用较少数量的不同数据点进行构造，类似于小引导程序袋（BLB）[A. Kleiner，A。Talwalkar，P。Sarkar和M. I. Jordan，“海量数据的可伸缩引导程序”，J。Roy。统计员。 SOC：Ser。 B（统计学家方法），第一卷。 76号4，第795–816页，2014年]。不相交的子集显着小于原始完整数据集，并且可以在不同的存储和计算单元中并行处理它们。然后，通过避免每个自举样本的估计器重新计算，可以节省大量计算时间。取而代之的是，遵循快速和稳健的自举方法（FRB）[M.]，通过智能逼近来解析计算有效的定点估计方程。 Salibián-Barrera，S。Van Aelst和G. Willems，“快速而强大的引导程序”，统计学家。方法应用，第一卷。 17号1，第41-71页，2008年]。我们提出的引导程序方法有助于在分析大规模数据集时使用高度可靠的统计方法。通过分析确定了该方法的有利统计特性。数值算例表明了该方法在分析大数据集时的可扩展性，低复杂度和强大的统计性能。

著录项

来源
《Signal Processing, IEEE Transactions on 》 |2016年第4期| 1007-1017| 共11页
作者
Basiri Shahab; Ollila Esa; Koivunen Visa;
展开▼
作者单位

Department of Signal Processing and Acoustics, Aalto University, Espoo,;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bag of little bootstraps; big data; bootstrap; distributed computation; fast and robust bootstrap; robust estimation;

机译：一袋小引导程序;大数据;引导程序;分布式计算;快速而强大的引导程序;稳健的估计;

相似文献

外文文献
中文文献
专利

1. Supertree Bootstrapping Methods for Assessing Phylogenetic Variation among Genes in Genome-Scale Data Sets [J] . J. Gordon Burleigh, Amy C. Driskell and Michael J. Sanderson Systematic Biology . 2006 ,第3期

机译：用于评估基因组规模数据集中各基因间系统发育差异的Superbootstrapping方法
2. Supertree bootstrapping methods for assessing phylogenetic variation among genes in genome-scale data sets [J] . Burleigh JG, Driskell AC, Sanderson MJ Systematic Biology . 2006 ,第3期

机译：用于评估基因组规模数据集中基因间系统发育变异的Superbootstrapping方法
3. Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics [J] . Umberto Ferraro Petrillo, Mara Sorella, Giuseppe Cattaneo, BMC Bioinformatics . 2019 ,第S4期

机译：分析基因组序列的大数据集：k-mer统计的快速和可扩展集合
4. Sparsity-promoting bootstrap method for large-scale data [C] . Visa Koivunen, Emad Mozafari Asilomar Conference on Signals, Systems and Computers . 2016

机译：大规模数据稀疏性引导程序
5. Scalable parallel methods for analyzing metagenomics data at extreme scale [D] . Daily, Jeffrey Alan 2015

机译：可扩展的并行方法，用于以极端规模分析宏基因组学数据
6. Scalable and Robust Regression Methods for Phenome-Wide Association Analysis on Large-Scale Biobank Data [O] . Wenjian Bi, Seunggeun Lee 2021

机译：大规模BioBANK数据的苯覆级关联分析的可扩展和强大的回归方法
7. Robust, scalable and fast bootstrap method for analyzing large scale data [O] . Basiri, Shahab, Ollila, Esa, Koivunen, Visa 2015

机译：用于分析大规模的稳健，可扩展且快速的引导方法数据

Robust, Scalable, and Fast Bootstrap Method for Analyzing Large Scale Data

摘要

著录项

相似文献

相关主题

期刊订阅