首页> 外文会议>International Joint Conference on Artificial Intelligence >Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration (Extended Abstract)
【24h】

Teaching AI Agents Ethical Values Using Reinforcement Learning and Policy Orchestration (Extended Abstract)

机译:使用强化学习和政策编排教授AI代理道德价值(扩展摘要)

获取原文

摘要

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using PacMan and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.
机译:自主网络体力在我们的生活中起着越来越大的作用。为了确保他们以与社会价值一致的方式行事,我们必须开发允许这些代理商不仅在环境中最大化奖励的技巧,而且还要学习和遵循社会的隐含限制。我们详细介绍了一种使用逆强化学习的新方法,从演示和加强学习学习一组未指定的约束,以学会学习最大化环境奖励。然后,基于匪盗的乐器然后在两个策略之间选择:基于约束和环境奖励的基于环境。上下文强盗串允许代理以新颖的方式混合策略,从奖励最大化或约束政策中采取最佳行动。此外,Orchestrator正在透明,在每次步骤中正在使用策略。我们使用Pacman来测试我们的算法,并显示代理商能够学会最佳地行动,在演示的约束中行动,并以复杂的方式混合这两个功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号