首页> 外文会议>International Conference on Control, Automation and Robotics >Forward-Looking Imaginative Planning Framework Combined with Prioritized-Replay Double DQN
【24h】

Forward-Looking Imaginative Planning Framework Combined with Prioritized-Replay Double DQN

机译:前瞻性的想象力规划框架与优先重播Double DQN相结合

获取原文

摘要

Many machine learning systems are built to solve the toughest planning problems. Such systems usually adopt a one-size-fits-all approach to different planning problems. This can lead to a waste of precious computing resources in simple planning problems, while not investing enough in complex problems. This requires a new model framework that does not require the learning of a single, fixed strategy, but rather introduces a series of decision controllers to resolve various planning tasks through learning to build, predict, and evaluate plans. Therefore, we propose a forward-looking imaginative planning framework combined with Prioritized-Replay Double DQN, which is a model-based continuous decision controller that determines the number of iterations of the decision-making process to be run and the model to be negotiated in each iteration. Before any single unit action, it can imagine and select actions based on the current state, including advance imagination with limited steps, and evaluate it with its model-based imagination. All imagined actions or outcomes will be iteratively integrated into a “plan environment”, which can test alternative imagined actions and be able to flexibly use a learned policy in the previously imagined state. Basis on these, the prioritized replay mode is adopted to improve the sampling weight and the training efficiency, which will make the metadata obtain lower overall cost than the traditional fixed strategy method, including task loss and calculation cost.
机译:建立了许多机器学习系统来解决最棘手的计划问题。对于不同的计划问题,此类系统通常采用一种万能的方法。这可能导致简单的计划问题浪费了宝贵的计算资源,而对复杂的问题却没有投入足够的资金。这就需要一个新的模型框架,该模型框架不需要学习单一的固定策略,而可以引入一系列决策控制器,以通过学习构建,预测和评估计划来解决各种计划任务。因此,我们提出了一个结合优先级重播Double DQN的前瞻性想象规划框架,这是一个基于模型的连续决策控制器,可确定要运行的决策过程的迭代次数以及要在其中协商的模型。每次迭代。在执行任何单个单元动作之前,它可以根据当前状态来想象和选择动作,包括以有限的步骤进行超前的想象,并通过基于模型的想象来对其进行评估。所有设想的行动或结果将被迭代地集成到“计划环境”中,该计划环境可以测试替代的设想的行动,并能够在先前设想的状态下灵活地使用已学习的策略。在此基础上,采用优先级重播模式来提高采样权重和训练效率,使元数据获得的总体成本要低于传统的固定策略方法,包括任务损失和计算成本。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号