【24h】

Strategic Advice Provision in Repeated Human-Agent Interactions

机译:反复的人与人互动中的战略建议

获取原文

摘要

This paper addresses the problem of automated advice provision in settings that involve repeated interactions between people and computer agents. This problem arises in many real world applications such as route selection systems and office assistants. To succeed in such settings agents must reason about how their actions in the present influence people's future actions. This work models such settings as a family of repeated bilateral games of incomplete information called "choice selection processes", in which players may share certain goals, but are essentially self-interested. The paper describes several possible models of human behavior that were inspired by behavioral economic theories of people's play in repeated interactions. These models were incorporated into several agent designs to repeatedly generate offers to people playing the game. These agents were evaluated in extensive empirical investigations including hundreds of subjects that interacted with computers in different choice selections processes. The results revealed that an agent that combined a hyperbolic discounting model of human behavior with a social utility function was able to outperform alternative agent designs, including an agent that approximated the optimal strategy using continuous MDPs and an agent using epsilon-greedy strategies to describe people's behavior. We show that this approach was able to generalize to new people as well as choice selection processes that were not used for training. Our results demonstrate that combining computational approaches with behavioral economics models of people in repeated interactions facilitates the design of advice provision strategies for a large class of real-world settings.
机译:本文解决了涉及人与计算机代理之间反复交互的环境中提供自动建议的问题。在诸如路由选择系统和办公室助理之类的许多实际应用中会出现此问题。为了在这样的环境中取得成功,代理商必须思考他们目前的行为如何影响人们未来的行为。这项工作将诸如设置重复的不完整信息的双边游戏家族(称为“选择过程”)进行建模,其中玩家可以共享某些目标,但本质上是自私的。本文描述了几种可能的人类行为模型,这些模型受到人们在反复互动中的行为经济学理论的启发。这些模型被合并到多个代理设计中,以反复生成游戏参与者的报价。在广泛的经验研究中对这些代理进行了评估,包括数百名与计算机在不同选择选择过程中进行交互的对象。结果表明,将人类行为的双曲线折扣模型与社会效用函数相结合的智能体能够胜过其他智能体设计,包括使用连续MDP逼近最佳策略的智能体和使用epsilon-greedy策略描述人们智能的智能体。行为。我们表明,这种方法能够推广到新人以及未用于培训的选择选择过程。我们的结果表明,在反复交互中将计算方法与人的行为经济学模型相结合,可以简化针对大量实际环境的建议提供策略的设计。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号