首页> 外文会议>Annual conference on Neural Information Processing Systems >Communication-Efficient Algorithms for Statistical Optimization
【24h】

Communication-Efficient Algorithms for Statistical Optimization

机译:用于统计优化的高效通信算法

获取原文

摘要

We study two communication-efficient algorithms for distributed statistical optimization on large-scale data. The first algorithm is an averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves mean-squared error that decays as O(N~(-1) + (N/m)~(-2)). Whenever m ≤ N~(1/2), this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of the bootstrap. Requiring only a single round of communication, it has mean-squared error that decays as O(N~(-1) + (N/m)~(-3)), and so is more robust to the amount of parallelization. We complement our theoretical results with experiments on large-scale problems from the internet search domain. In particular, we show that our methods efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which consists of N ≈ 2.4 × 10~8 samples and d ≥ 700,000 dimensions.
机译:我们研究了两种用于大型数据的分布式统计优化的高效通信算法。第一种算法是一种平均方法,可以将N个数据样本平均分配给m台机器,对每个子集执行单独的最小化,然后对估计值求平均。我们对该平均混合算法进行了清晰的分析,显示出在合理的一组条件下,组合参数获得的均方误差随着O(N〜(-1)+(N / m)〜(-2)的变化而衰减。 )。每当m≤N〜(1/2)时,此保证与通过访问所有N个样本的集中式算法可获得的最佳可能速率匹配。第二种算法是基于适当形式的引导程序的新颖方法。仅需要单轮通信,它的均方误差就会随着O(N〜(-1)+(N / m)〜(-3))的变化而衰减,因此对并行化的数量更加健壮。我们通过对来自互联网搜索领域的大规模问题的实验来补充我们的理论结果。尤其是,我们证明了我们的方法有效地解决了中国SoSo搜索引擎的广告预测问题,该搜索引擎由N≈2.4×10〜8个样本和d≥700,000个维度组成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号