Incorporating Behavioral Constraints in Online AI Systems

机译：在Online AI系统中包含行为约束

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

AI systems that learn through reward feedback about the actions they take are increasingly deployed in domains that have significant impact on our daily life. However, in many cases the online rewards should not be the only guiding criteria, as there are additional constraints and/or priorities imposed by regulations, values, preferences, or ethical principles. We detail a novel online agent that learns a set of behavioral constraints by observation and uses these learned constraints as a guide when making decisions in an online setting while still being reactive to reward feedback. To define this agent, we propose to adopt a novel extension to the classical contextual multi-armed bandit setting and we provide a new algorithm called Behavior Constrained Thompson Sampling (BCTS) that allows for online learning while obeying exogenous constraints. Our agent learns a constrained policy that implements the observed behavioral constraints demonstrated by a teacher agent, and then uses this constrained policy to guide the reward-based online exploration and exploitation. We characterize the upper bound on the expected regret of the contextual bandit algorithm that underlies our agent and provide a case study with real world data in two application domains. Our experiments show that the designed agent is able to act within the set of behavior constraints without significantly degrading its overall reward performance.

机译：通过奖励反馈学习关于他们所采取的行动的AI系统越来越多地部署在对我们日常生活产生重大影响的域中。但是，在许多情况下，在线奖励不应该是唯一的指导标准，因为法规，价值观，偏好或道德原则存在额外的限制和/或优先事项。我们详细介绍了一部小型在线代理，通过观察来学习一组行为约束，并在在线设置中做出决策时，使用这些学习的约束作为指南，同时仍然是有效的，以奖励反馈。要定义此代理，我们建议采用古典上下文多武装强盗设置的小说扩展，我们提供了一种称为行为的新算法，约束汤普森采样（BCT），允许在线学习，同时服从外源性约束。我们的代理学习受约束的策略，实现教师代理所展示的观察到的行为限制，然后使用此约束策略来指导基于奖励的在线探索和剥削。我们将上限的上限归结为上下文的匪徒算法的遗憾，以解除我们的代理，并提供两个应用领域的真实世界数据的案例研究。我们的实验表明，设计的代理能够在行为限制范围内采取行动，而不会显着降低其整体奖励性能。

著录项

来源
《AAAI Conference on Artificial Intelligence》|2019年|832p|共9页
会议地点
作者
Avinash Balakrishnan; Djallel Bouneffouf; Nicholas Mattei; Francesca Rossi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Implementing an online bond quality inspection system for cold roll bonded AI/AI-Sn/AI/steel strips using guided wave EMATs [J] . Tallafuss P. J., Rosochowski A., Campbell S., Insight . 2018,第3期

机译：使用导波EMAT为冷轧AI / AI-Sn / AI /钢带实施在线粘合质量检查系统
2. Detecting health misinformation in online health communities: Incorporating behavioral features into machine learning based approaches [J] . Yuehua Zhao, Jingwei Da, Jiaqi Yan Information Processing & Management . 2021,第1期

机译：检测在线健康社区中的健康错误信息：将行为特征纳入基于机器学习的方法
3. Incorporating Mindfulness and Chat Groups Into an Online Cognitive Behavioral Therapy for Mixed Female Sexual Problems [J] . Hucker Alice, McCabe Marita P. Journal of sex research . 2015,第6期

机译：将正念和聊天群组纳入针对混合女性性问题的在线认知行为疗法中
4. Incorporating Behavioral Constraints in Online AI Systems [C] . Avinash Balakrishnan, Djallel Bouneffouf, Nicholas Mattei, AAAI Conference on Artificial Intelligence . 2019

机译：在Online AI系统中包含行为约束
5. Elisa: A new system for AI-assisted logico-mathematical scientific discovery incorporating novel techniques in infinite model finding. [D] . Shilliday, Andrew Edward. 2009

机译：Elisa：一种用于AI辅助逻辑数学科学发现的新系统，该系统将新颖的技术结合到了无限的模型发现中。
6. Editorial: Contemporary Faces of Diabetes Care for Youth and Young Adults in the 21st Century: Evolution in the Roles of Patients and Families Healthcare Providers and Systems Behavioral Health and the Online Community [O] . Marisa E. Hilliard, Barbara J. Anderson -1

机译：社论：21世纪面向年轻人和年轻人的糖尿病患者的当代面孔：患者和家庭医疗保健提供者和系统行为健康以及在线社区的角色演变
7. Incorporating Behavioral Constraints in Online AI Systems [O] . Avinash Balakrishnan, Djallel Bouneffouf, Nicholas Mattei, 2019

机译：在线AI系统中包含行为约束

Incorporating Behavioral Constraints in Online AI Systems

摘要

著录项

相似文献

相关主题

期刊订阅