首页> 外文会议>Conference on Neural Information Processing Systems >Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control
【24h】

Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control

机译:计划算术:多任务控制的组成计划向量

获取原文

摘要

Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations - for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.
机译:位于现实世界环境中的自主代理必须能够掌握大量技能。虽然可以快速学习单一的短技能,但独立学习每项任务是不切实际的。相反,代理应该共享跨行为的知识,使得可以有效地学习每个任务,使得结果模型可以概括为新任务,尤其是那些是先前所看到的任务的组成或子集。在目标或示范上的一项政策有可能在任务之间分享知识,如果它看到足够的投入。但是,这些方法可能在测试时间的更复杂任务中可能不会概括。我们介绍了组成计划向量(CPV),使策略能够在没有额外监督的情况下执行任务的组成。 CPV表示轨迹作为其中的子组织的总和。我们表明CPV可以在一拍模仿学习框架内学习,而无需任何其他监督或有关任务层次结构的信息,并启用演示调节策略以概括为序列的任务,这是培训期间的任务的许多技能的序列。类似于NLP中的Word2Vec等嵌入,CPV也可以支持简单的算术运算 - 例如,我们可以为两个不同的任务添加CPV来命令代理程序来组成两个任务,而无需任何额外的培训。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号