首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Learning What Information to Give in Partially Observed Domains
【24h】

Learning What Information to Give in Partially Observed Domains

机译:学习部分观察到的域名提供什么信息

获取原文
           

摘要

In many robotic applications, an autonomous agent must act within and explore a partially observed environment that is unobserved by its human team-mate. We consider such a setting in which the agent can, while acting, transmit declarative information to the human that helps them understand aspects of this unseen environment. In this work, we address the algorithmic question of how the agent should plan out what actions to take and what information to transmit. Naturally, one would expect the human to have preferences, which we model information-theoretically by scoring transmitted information based on the change it induces in weighted entropy of the human’s belief state. We formulate this setting as a belief MDP and give a tractable algorithm for solving it approximately. Then, we give an algorithm that allows the agent to learn the human’s preferences online, through exploration. We validate our approach experimentally in simulated discrete and continuous partially observed search-and-recover domains. Visit http://tinyurl.com/chitnis-corl-18 for a supplementary video.
机译:在许多机器人应用中,自主代理人必须在内部行动,并探索由其人类队友不受欢迎的部分观察到的环境。我们考虑这种设置,其中代理商可以在作用时向人类发送声明信息,帮助他们理解这种看不见环境的方面。在这项工作中,我们解决了代理人应该如何规划到哪些行动以及传输的信息的算法问题。当然,人们希望人类具有偏好,从理论上,通过基于改变来评分传输信息,从而在人类信仰状态的加权熵熵中进行评分。我们将此设置作为信仰MDP制定,并提供了一种易于解决它的算法。然后,我们通过探索给出一种允许代理商在线学习人类偏好的算法。我们通过实验验证了模拟离散和连续部分观察到的搜索域的方法。访问http://tinyurl.com/chitnis-corl-18以获得补充视频。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号