Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference

机译：基于变分贝叶斯推理的直接策略搜索强化学习

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Direct policy search is a promising reinforcement learning framework in particular for controlling continuous, high-dimensional systems. As one of direct policy search, reward weighted regression (RWR) was proposed by Peters et al. The RWR algorithm estimates the policy parameter based on EM algorithm and is therefore prone to overfitting. In this paper, we focus on variational Bayesian inference to avoid overfitting problem, and propose the direct policy search reinforcement learning based on variational Bayesian inference (VBRL). The performance of the proposed VBRL is assessed in several experiments with mountain car and ball batting task. These experiments highlight the VBRL produces higher average return and outperforms the RWR.

机译：直接策略搜索是一种很有前途的强化学习框架，特别是用于控制连续的高维系统。作为直接策略搜索之一，Peters等人提出了奖励加权回归（RWR）。 RWR算法基于EM算法估计策略参数，因此容易过拟合。在本文中，我们着重于变分贝叶斯推理以避免过度拟合问题，并提出了基于变分贝叶斯推理（VBRL）的直接策略搜索强化学习。拟议的VBRL的性能在山地车和击球任务的多个实验中得到了评估。这些实验表明，VBRL产生更高的平均收益，并且胜过RWR。

著录项

来源
《International Symposium on Advanced Intelligent Systems;International Conference on Soft Computing and Intelligent Systems》|2018年|1009-1014|共6页
会议地点 Toyama(JP)
作者
Nobuhiko Yamaguchi; Kazuya Ihara; Osamu Fukuda; Hiroshi Okumura;
展开▼
作者单位

Saga University;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Reinforcement learning; Bayes methods; Task analysis; Automobiles; History; Inference algorithms; Two dimensional displays;

机译：强化学习；贝叶斯方法；任务分析；汽车；历史;推理算法；二维显示;

相似文献

外文文献
中文文献
专利

1. Rapidly Learning Bayesian Networks for Complex System Diagnosis: A Reinforcement Learning Directed Greedy Search Approach [J] . Zhang Wenfeng, Feng Wenquan, Zhao Hongbo, Quality Control, Transactions . 2020,第期

机译：快速学习复杂系统诊断的贝叶斯网络：加强学习执导贪婪搜索方法
2. Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm [J] . Robert Busa-Fekete, Balazs Szoerenyi, Paul Weng, Machine Learning . 2014,第3期

机译：基于偏好的强化学习：使用基于偏好的竞速算法进行进化直接策略搜索
3. Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning [J] . Hirotaka Hachiya, Jan Peters, Masashi Sugiyama Neural computation . 2011,第11期

机译：强化学习中的样本重用的奖励加权回归与直接策略搜索
4. Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference [C] . Nobuhiko Yamaguchi, Kazuya Ihara, Osamu Fukuda, International Symposium on Advanced Intelligent Systems . 2018

机译：基于变分贝叶斯推理的直接政策搜索强化学习
5. Bayesian Methods for Knowledge Transfer and Policy Search in Reinforcement Learning. [D] . Wilson, Aaron. 2012

机译：强化学习中的知识转移和策略搜索的贝叶斯方法。
6. SPARSE BAYESIAN LEARNING USING VARIATIONAL BAYES INFERENCE BASED ON A GREEDY-BASED CRITERION [O] . Mohammad Shekaramiz, Todd K. Moon, Jacob H. Gunther -1

机译：基于贪婪准则的变数贝叶斯推理稀疏贝叶斯学习
7. Rapidly Learning Bayesian Networks for Complex System Diagnosis: A Reinforcement Learning Directed Greedy Search Approach [O] . Wenfeng Zhang, Wenquan Feng, Hongbo Zhao, 2020

机译：快速学习复杂系统诊断的贝叶斯网络：加强学习执导贪婪搜索方法

Direct Policy Search Reinforcement Learning Based on Variational Bayesian Inference

摘要

著录项

相似文献

相关主题

期刊订阅