Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration (Extended Abstract)

机译：使用强化学习和政策编排教授AI代理道德价值（扩展摘要）

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using PacMan and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.

机译：自主网络体力在我们的生活中起着越来越大的作用。为了确保他们以与社会价值一致的方式行事，我们必须开发允许这些代理商不仅在环境中最大化奖励的技巧，而且还要学习和遵循社会的隐含限制。我们详细介绍了一种使用逆强化学习的新方法，从演示和加强学习学习一组未指定的约束，以学会学习最大化环境奖励。然后，基于匪盗的乐器然后在两个策略之间选择：基于约束和环境奖励的基于环境。上下文强盗串允许代理以新颖的方式混合策略，从奖励最大化或约束政策中采取最佳行动。此外，Orchestrator正在透明，在每次步骤中正在使用策略。我们使用Pacman来测试我们的算法，并显示代理商能够学会最佳地行动，在演示的约束中行动，并以复杂的方式混合这两个功能。

著录项

来源
《International Joint Conference on Artificial Intelligence》|2020年|5850-6589p|共5页
会议地点
作者
Ritesh Noothigattu; Djallel Bouneffouf; Nicholas Mattei; Rachita Chandra; Piyush Madan; Kush R. Varshney; Murray Campbell; Moninder Singh; Francesca Rossi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Deep reinforcement learning based trading agents: Risk curiosity driven learning for financial rules-based policy [J] . Hirchoua Badr, Ouhbi Brahim, Frikh Bouchra Expert systems with applications . 2021,第May期

机译：基于深度强化学习的交易代理：金融规则的危险效力驱动学习
2. Improving RTS Game AI by Supervised Policy Learning, Tactical Search, and Deep Reinforcement Learning [J] . Barriga Nicolas A., Stanescu Marius, Besoain Felipe, IEEE computational intelligence magazine . 2019,第3期

机译：通过监督策略学习，战术搜索和深度强化学习来改善RTS Game AI
3. Potential-based multiobjective reinforcement learning approaches to low-impact agents for AI safety [J] . Peter Vamplew, Cameron Foale, Richard Dazeley, Engineering Applications of Artificial Intelligence . 2021,第Apra期

机译：基于潜在的多目标增强学习方法，用于AI安全的低冲击剂
4. Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration (Extended Abstract) [C] . Ritesh Noothigattu, Djallel Bouneffouf, Nicholas Mattei, International Joint Conference on Artificial Intelligence . 2020

机译：使用强化学习和政策编排教授AI代理道德价值（扩展摘要）
5. Tracking changes in teaching and learning abstract algebra: Beliefs and ability to abstract. [D] . Hirsch, Jenna. 2008

机译：跟踪教学和学习抽象代数的变化：信念和抽象能力。
6. Human Reinforcement Learning Subdivides Structured Action Spaces by Learning Effector-Specific Values [O] . Samuel J. Gershman, Bijan Pesaran, Nathaniel D. Daw 2009

机译：人类强化学习通过学习效应子特定值来细分结构化的动作空间
7. Hierarchical Reinforcement Learning for Pedagogical Policy Induction (Extended Abstract) [O] . Guojing Zhou, Hamoon Azizsoltani, Markel Sanz Ausin, 2020

机译：教学政策诱导的分层加固学习（扩展摘要）
8. Time-Extended Policies in Mult-Agent Reinforcement Learning [R] . Tumer, Kagan, Agogino, Adrian K. 2004

机译：多agent强化学习中的时间扩展策略

Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration (Extended Abstract)

摘要

著录项

相似文献

相关主题

期刊订阅