首页> 外文期刊>Computational statistics >Model aggregation for doubly divided data with large size and large dimension
【24h】

Model aggregation for doubly divided data with large size and large dimension

机译:大尺寸和大维度的双分割数据的模型聚合

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Massive data are often featured with high dimensionality as well as large sample size, which typically cannot be stored in a single machine and thus make both analysis and prediction challenging. We propose a distributed gridding model aggregation (DGMA) approach to predicting the conditional mean of a response variable, which overcomes the storage limitation of a single machine and the curse of high dimensionality. Specifically, on each local machine that stores partial data of relatively moderate sample size, we develop the model aggregation approach by splitting predictors wherein a greedy algorithm is developed. To obtain the optimal weights across all local machines, we further design a distributed and communication-efficient algorithm. Our procedure effectively distributes the workload and dramatically reduces the communication cost. Extensive numerical experiments are carried out on both simulated and real datasets to demonstrate the feasibility of the DGMA method.
机译:海量数据通常具有高维数和大样本量的特点,通常无法存储在一台机器中,因此分析和预测都具有挑战性。我们提出了一种分布式网格模型聚合(DGMA)方法来预测响应变量的条件均值,克服了单机的存储限制和高维的诅咒。具体来说,在每台存储样本量相对适中的部分数据的局部机器上,我们通过拆分预测变量来开发模型聚合方法,其中开发了一种贪婪算法。为了在所有本地机器上获得最佳权重,我们进一步设计了一种分布式和通信效率的算法。我们的程序有效地分配了工作量,并大大降低了沟通成本。在仿真数据集和真实数据集上进行了大量的数值实验,验证了DGMA方法的可行性。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号