Model aggregation for doubly divided data with large size and large dimension

He Baihua; Liu Yanyan; Yin GuoshengWu Yuanshan

首页> 外文期刊>Computational statistics >Model aggregation for doubly divided data with large size and large dimension

【24h】

Model aggregation for doubly divided data with large size and large dimension

机译：大尺寸和大维度的双分割数据的模型聚合

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相关主题

摘要

Massive data are often featured with high dimensionality as well as large sample size, which typically cannot be stored in a single machine and thus make both analysis and prediction challenging. We propose a distributed gridding model aggregation (DGMA) approach to predicting the conditional mean of a response variable, which overcomes the storage limitation of a single machine and the curse of high dimensionality. Specifically, on each local machine that stores partial data of relatively moderate sample size, we develop the model aggregation approach by splitting predictors wherein a greedy algorithm is developed. To obtain the optimal weights across all local machines, we further design a distributed and communication-efficient algorithm. Our procedure effectively distributes the workload and dramatically reduces the communication cost. Extensive numerical experiments are carried out on both simulated and real datasets to demonstrate the feasibility of the DGMA method.

机译：海量数据通常具有高维数和大样本量的特点，通常无法存储在一台机器中，因此分析和预测都具有挑战性。我们提出了一种分布式网格模型聚合（DGMA）方法来预测响应变量的条件均值，克服了单机的存储限制和高维的诅咒。具体来说，在每台存储样本量相对适中的部分数据的局部机器上，我们通过拆分预测变量来开发模型聚合方法，其中开发了一种贪婪算法。为了在所有本地机器上获得最佳权重，我们进一步设计了一种分布式和通信效率的算法。我们的程序有效地分配了工作量，并大大降低了沟通成本。在仿真数据集和真实数据集上进行了大量的数值实验，验证了DGMA方法的可行性。

著录项

来源
《Computational statistics》 |2023年第1期|509-529|共21页
作者
He Baihua; Liu Yanyan; Yin GuoshengWu Yuanshan;
展开▼
作者单位

Zhongnan Univ Econ & Law;

Univ Hong Kong;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类概率论与数理统计;
关键词
Communication efficiency; Computation complexity; Distributed algorithm; Greedy algorithm; High dimension; One-shot approach; Prediction; Storage ability; AVERAGING APPROACH; COMBINATION;

机译：沟通效率;计算复杂度;分布式算法;贪婪算法;高维;一键式方法;预测;存储能力;平均法;组合;

Model aggregation for doubly divided data with large size and large dimension

摘要

著录项

引文网络

相关主题

期刊订阅