首页> 外文会议>IEEE Conference on Decision and Control >Finite-Sample Analysis of Multi-Agent Policy Evaluation with Kernelized Gradient Temporal Difference
【24h】

Finite-Sample Analysis of Multi-Agent Policy Evaluation with Kernelized Gradient Temporal Difference

机译:多智能梯度时间差异的多助药策略评估的有限样本分析

获取原文

摘要

In this work we will provide a finite-sample analysis of a distributed gradient temporal difference algorithm for policy evaluation with value functions that lie in Reproducing Kernel Hilbert Spaces (RKHS). This work focuses on multi-agent systems where each agent observes a private reward and agents can only communicate with nearby neighbors under time varying networks. The main result is a time-evolving upper bound of the second order error statistics of the algorithm, which accounts for the evolution of the consensus error as well as the average approximation error. This result shows that the distributed learning algorithm under consideration can achieve a bounded final error covariance that is inversely proportional to the algorithm step-size, which is consistent with results in the more general field of stochastic approximation.
机译:在这项工作中,我们将提供对策略评估的分布式梯度时间差分算法的有限样本分析,其值函数在再现内核希尔伯特空间(RKHS)中。这项工作侧重于多种代理系统,其中每个代理人观察私人奖励,代理商只能与附近的邻居沟通随着时间的改变网络。主要结果是算法的二阶误差统计数据的时间不断发展,其占共识误差的演变以及平均近似误差。该结果表明,所考虑的分布式学习算法可以实现与算法步长大小成反比的有界最终错误协方差,这与导致随机近似的更通用领域一致。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号