首页> 外文会议>AAAI Conference on Artificial Intelligence >Incorporating Behavioral Constraints in Online AI Systems
【24h】

Incorporating Behavioral Constraints in Online AI Systems

机译:在Online AI系统中包含行为约束

获取原文

摘要

AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online agent that learns a set of behavioral constraints by observation and uses these learned constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. To define this agent, we propose to adopt a novel extension to the classical contextual multi-armed bandit setting and we provide a new algorithm called Behavior Constrained Thompson Sampling (BCTS) that allows for online learning while obeying exogenous constraints. Our agent learns a constrained policy that implements the observed behavioral constraints demonstrated by a teacher agent, and then uses this constrained policy to guide the reward-based online exploration and exploitation. We characterize the upper bound on the expected regret of the contextual bandit algorithm that underlies our agent and provide a case study with real world data in two application domains. Our experiments show that the designed agent is able to act within the set of behavior constraints without significantly degrading its overall reward performance.
机译:通过奖励反馈学习关于他们所采取的行动的AI系统越来越多地部署在对我们日常生活产生重大影响的域中。但是,在许多情况下,在线奖励不应该是唯一的指导标准,因为法规,价值观,偏好或道德原则存在额外的限制和/或优先事项。我们详细介绍了一部小型在线代理,通过观察来学习一组行为约束,并在在线设置中做出决策时,使用这些学习的约束作为指南,同时仍然是有效的,以奖励反馈。要定义此代理,我们建议采用古典上下文多武装强盗设置的小说扩展,我们提供了一种称为行为的新算法,约束汤普森采样(BCT),允许在线学习,同时服从外源性约束。我们的代理学习受约束的策略,实现教师代理所展示的观察到的行为限制,然后使用此约束策略来指导基于奖励的在线探索和剥削。我们将上限的上限归结为上下文的匪徒算法的遗憾,以解除我们的代理,并提供两个应用领域的真实世界数据的案例研究。我们的实验表明,设计的代理能够在行为限制范围内采取行动,而不会显着降低其整体奖励性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号