首页> 外文学位 >Coaching: Learning and using environment and agent models for advice.
【24h】

Coaching: Learning and using environment and agent models for advice.

机译:指导:学习和使用环境和代理模型以获取建议。

获取原文
获取原文并翻译 | 示例

摘要

Coaching is a relationship where one agent provides advice to another about how to act. This thesis explores a range of problems faced by an automated coach agent in providing advice to one or more automated advice-receiving agents. The coach's job is to help the agents perform as well as possible in their environment. We identify and address a set of technical challenges: How can the coach learn and use models of the environment? How should advice be adapted to the peculiarities of the advice receivers? How can opponents be modeled, and how can those models be used? How should advice be represented to be effectively used by a team? This thesis serves both to define the coaching problem and explore solutions to the challenges posed.; This thesis is inspired by a simulated robot soccer environment with a coach agent who can provide advice to a team in a standard language. This author developed, in collaboration with others, this coach environment and standard language as the thesis progressed. The experiments in this thesis represent the largest known empirical study in the simulated robot soccer environment. A predator-prey domain and a moving maze environment are used for additional experimentation. All algorithms are implemented in at least one of these environments and empirical validation is performed.; In addition to the coach problem formulation and decompositions, the thesis makes several main technical contributions: (i) Several opponent model representations with associated learning algorithms, whose effectiveness in the robot soccer domain is demonstrated. (ii) A study of the effects and need for coach learning under various limitations of the advice receiver and communication bandwidth. (iii) The Multi-Agent Simple Temporal Network, a multi-agent plan representation which is refinement of a Simple Temporal Network, with an associated distributed plan execution algorithm. (iv) Algorithms for learning an abstract Markov Decision Process from external observations, a given state abstraction, and partial abstract action templates. The use of the learned MDP for advice is explored in various scenarios.
机译:教练是一种关系,其中一个特工向另一人提供有关如何行动的建议。本文探讨了自动化教练代理在向一个或多个自动化建议接收代理提供建议时面临的一系列问题。教练的工作是帮助特工在他们的环境中表现最佳。我们确定并解决一系列技术挑战:教练如何学习和使用环境模型?建议应如何适应建议接受者的特点?如何为对手建模,以及如何使用这些模型?应该如何表示建议才能被团队有效使用?本论文既可以定义教练问题,也可以解决所面临的挑战。本论文的灵感来自具有教练代理的模拟机器人足球环境,该教练可以以标准语言为团队提供建议。随着论文的发展,作者与其他人一起开发了这种教练环境和标准语言。本文的实验代表了模拟机器人足球环境中最大的已知经验研究。捕食者-猎物域和移动迷宫环境被用于其他实验。所有算法都至少在这些环境之一中实现,并进行经验验证。除了教练问题的表述和分解之外,本文还做出了一些主要的技术贡献:(i)几种带有相关学习算法的对手模型表示,并证明了它们在机器人足球领域的有效性。 (ii)研究在建议接收者和通信带宽的各种限制下教练学习的效果和需求。 (iii)多主体简单时态网络,一种多主体计划表示形式,它是对简单时态网络的改进,具有关联的分布式计划执行算法。 (iv)用于从外部观察,给定状态抽象和部分抽象动作模板中学习抽象马尔可夫决策过程的算法。在各种情况下都探讨了将学习到的MDP用作建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号