Credit Assignment Method for Learning Effective Stochastic Policies in Uncertain Domains

机译：在不确定域中学习有效随机政策的信用分配方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, we introduce FirstVisit Profit-sharing (FVPS) as a credit assignment procedure, an important issue in classifier systems and reinforcement learning frameworks. FVPS reinforces effective rules to make an agent acquire stochastic policies that cause it to behave very robustly within uncertain domains, without pre-defined knowledge or subgoals. We use an internal episodic memory, not only to identify perceptual aliasing states but also to discard looping behavior and to acquire effective stochastic policies to escape perceptual deceptive states. We demonstrate the effectiveness of our method in some typical classes of Partially Observable Markov Decision Processes, comparing with Sarsa(A) using a replacing eligibility trace. We claim that this approach results in an effective stochastic or deterministic policy which is appropriate for the environment.

机译：在本文中，我们介绍了FirstVisit利润共享（FVPS）作为信用分配程序，是分类器系统和加强学习框架中的一个重要问题。 FVPS加强了有效的规则，使代理商收购随机政策，使其在不确定的域内在不确定的域中表现得非常强大，而无需预定义的知识或亚地。我们使用内部焦虑记忆，不仅要识别感知别名状态，还要丢弃循环行为，并获得有效的随机政策以逃避感知欺骗状态。我们展示了我们在一些典型的部分观察到的马尔可夫决策过程中的方法的有效性，使用替代资格轨迹比较Sarsa（A）。我们声称，这种方法导致有效的随机或确定性政策，适合环境。

著录项

来源
《Genetic and evolutionary computation conference》|2001年||共8页
会议地点
作者
Sachiyo Arai; Katia Sycara;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类其他感受器;
关键词

相似文献

外文文献
中文文献
专利

1. Effective Methods for Reinforcement Learning in Large Multi-Agent Domains [J] . Martin Riedmiller, Daniel Withopf Information Technology . 2005,第5期

机译：大型多Agent领域中强化学习的有效方法
2. Training a robust reinforcement learning controller for the uncertain system based on policy gradient method [J] . Li Zhan, Xue Shengri, Lin Weiyang, Neurocomputing . 2018,第NOVa17期

机译：基于策略梯度法的不确定系统鲁棒强化学习控制器训练
3. PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning [J] . Li Shilei, Li Meng, Su Jiongming, ACM transactions on intelligent systems and technology . 2021,第3期

机译：PP-PG：将参数扰动与政策梯度方法相结合，为深加固学习中有效和高效的探索
4. Credit Assignment Method for Learning Effective Stochastic Policies in Uncertain Domains [C] . Sachiyo Arai, Katia Sycara Genetic and evolutionary computation conference . 2001

机译：在不确定域中学习有效随机政策的信用分配方法
5. Stochastic Explanations: Learning From Mistakes In Stochastic Domains. [D] . Finestrali, Giulio. 2013

机译：随机说明：从随机域中的错误中学习。
6. Desirability availability credit assignment category learning and attention: Cognitive-emotional and working memory dynamics of orbitofrontal ventrolateral and dorsolateral prefrontal cortices [O] . Stephen Grossberg 2018

机译：可取性可用性学分分配类别学习和注意：眶额腹侧和背外侧前额皮层的认知情感和工作记忆动力学
7. A Reinforcement Learning Method with the Inference of the Other Agent's Policy for 2-Player Stochastic Games [O] . 長行康男, 伊藤実 2003

机译：一种基于二人随机游戏对方代理策略的强化学习方法
8. Solving the Credit Assignment Problem: The Interaction of Explicit and Implicit Learning with Internal and External State Information [R] . Fu, W. , Anderson, J. R. 2006

机译：解决信用分配问题：外显和内隐学习与内部和外部国家信息的相互作用

Credit Assignment Method for Learning Effective Stochastic Policies in Uncertain Domains

摘要

著录项

相似文献

相关主题

期刊订阅