...
首页> 外文期刊>Naval Research Logistics >A Least Squares Temporal Difference Actor-Critic Algorithm with Applications to Warehouse Management
【24h】

A Least Squares Temporal Difference Actor-Critic Algorithm with Applications to Warehouse Management

机译:最小二乘时间差异Actor-Critic算法在仓库管理中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

This article develops a new approximate dynamic programming (DP) algorithm for Markov decision problems and applies it to a vehicle dispatching problem arising in warehouse management. The algorithm is of the actor-critic type and uses a least squares temporal difference learning method. It operates on a sample-path of the system and optimizes the policy within a prespecified class parameterized by a parsimonious set of parameters. The method is applicable to a partially observable Markov decision process setting where the measurements of state variables are potentially corrupted, and the cost is only observed through the imperfect state observations. We show that under reasonable assumptions, the algorithm converges to a locally optimal parameter set. We also show that the imperfect cost observations do not affect the policy and the algorithm minimizes the true expected cost. In the warehouse application, the problem is to dispatch sensor-equipped forklifts in order to minimize operating costs involving product movement delays and forklift maintenance. We consider instances where standard DP is computationally intractable. Simulation results confirm the theoretical claims of the article and show that our algorithm converges more smoothly than earlier actor-critic algorithms while substantially outperforming heuristics used in practice.
机译:本文针对马尔可夫决策问题开发了一种新的近似动态规划(DP)算法,并将其应用于仓库管理中出现的车辆调度问题。该算法是演员批评型的,并且使用最小二乘时间差学习方法。它在系统的样本路径上运行,并在由简约参数集参数化的预定类中优化策略。该方法适用于部分可观察的马尔可夫决策过程设置,在该设置中状态变量的测量可能被破坏,并且仅通过不完善的状态观察才能观察到成本。我们表明,在合理的假设下,该算法收敛于局部最优参数集。我们还表明,不完美的成本观察结果不会影响策略,并且该算法将真实的预期成本降至最低。在仓库应用中,问题是要派遣配备传感器的叉车,以最大程度地减少涉及产品移动延迟和叉车维护的运营成本。我们考虑标准DP在计算上难以解决的情况。仿真结果证实了本文的理论主张,并表明我们的算法比早期的行为评论家算法更平滑地收敛,同时大大优于实际中使用的启发式算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号