首页> 外文会议>IEEE Conference on Decision and Control >Finite-Sample Analysis of Multi-Agent Policy Evaluation with Kernelized Gradient Temporal Difference

【24h】

Finite-Sample Analysis of Multi-Agent Policy Evaluation with Kernelized Gradient Temporal Difference

机译：多智能梯度时间差异的多助药策略评估的有限样本分析

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work we will provide a finite-sample analysis of a distributed gradient temporal difference algorithm for policy evaluation with value functions that lie in Reproducing Kernel Hilbert Spaces (RKHS). This work focuses on multi-agent systems where each agent observes a private reward and agents can only communicate with nearby neighbors under time varying networks. The main result is a time-evolving upper bound of the second order error statistics of the algorithm, which accounts for the evolution of the consensus error as well as the average approximation error. This result shows that the distributed learning algorithm under consideration can achieve a bounded final error covariance that is inversely proportional to the algorithm step-size, which is consistent with results in the more general field of stochastic approximation.

机译：在这项工作中，我们将提供对策略评估的分布式梯度时间差分算法的有限样本分析，其值函数在再现内核希尔伯特空间（RKHS）中。这项工作侧重于多种代理系统，其中每个代理人观察私人奖励，代理商只能与附近的邻居沟通随着时间的改变网络。主要结果是算法的二阶误差统计数据的时间不断发展，其占共识误差的演变以及平均近似误差。该结果表明，所考虑的分布式学习算法可以实现与算法步长大小成反比的有界最终错误协方差，这与导致随机近似的更通用领域一致。

著录项

来源
《IEEE Conference on Decision and Control》|2020年|5647-5652|共6页
会议地点
作者
Paulo Heredia; Shaoshuai Mou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Approximation algorithms; Kernel; Reinforcement learning; Upper bound; Dictionaries; Convergence; Hilbert space;

机译：近似算法;内核;加强学习;上限;词典;融合;希尔伯特空间;

相似文献

外文文献
中文文献
专利

1. Policy Evaluation in Continuous MDPs With Efficient Kernelized Gradient Temporal Difference [J] . Koppel Alec, Warnell Garrett, Stump Ethan, IEEE Transactions on Automatic Control . 2021,第4期

机译：连续MDP的政策评估，具有高效的脑级梯度时间差异
2. Kernel-Based Least Squares Temporal Difference With Gradient Correction [J] . T. Song, D. Li, L. Cao, Neural Networks and Learning Systems, IEEE Transactions on . 2016,第4期

机译：基于核的最小二乘时间差异和梯度校正
3. Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence [J] . Milo? S. Stankovi?, Marko Beko, Srdjan S. Stankovi? IFAC PapersOnLine . 2020,第2期

机译：分布式梯度时间差异偏离策略学习与资格痕迹：弱收敛
4. Policy Evaluation with Temporal Differences: A Survey and Comparison [C] . Christoph Dann, Gerhard Neumann, Jan Peters International Conference on Automated Planning and Scheduling . 2015

机译：具有时间差异的政策评估：调查和比较
5. Stable isotope analysis of the Laurentian Great Lakes food webs: Quantifying spatial and temporal food web differences. [D] . Schmidt, Stephanie Noel. 2008

机译：Laurentian大湖区食物网的稳定同位素分析：量化空间和时间食物网的差异。
6. Original research: Impact evaluation of the free maternal healthcare policy on the risk of neonatal and infant deaths in four sub-Saharan African countries: a quasi-experimental design with propensity score Kernel matching and difference in differences analysis [O] . Duah Dwomoh, Kofi Agyabeng, Kwame Agbeshie, 2020

机译：原始研究：四个撒哈拉以南非洲国家的免费孕产妇保健政策对新生儿和婴儿死亡风险的影响评估：具有倾向评分内核匹配和差异分析的准实验设计
7. Policy Evaluation in Continuous MDPs With Efficient Kernelized Gradient Temporal Difference [O] . Alec Koppel, Garrett Warnell, Ethan Stump, 2021

机译：连续MDP的政策评估，具有高效的脑级梯度时间差异

Finite-Sample Analysis of Multi-Agent Policy Evaluation with Kernelized Gradient Temporal Difference

摘要

著录项

相似文献

相关主题

期刊订阅