首页> 外文会议>IEEE International Conference on Automation Science and Engineering >Comparison study of two reinforcement learning based real-time control policies for two-machine-one-buffer production system
【24h】

Comparison study of two reinforcement learning based real-time control policies for two-machine-one-buffer production system

机译:两机一缓冲生产系统中两种基于强化学习的实时控制策略的比较研究

获取原文

摘要

Real-time control policy of production system is attractive to reduce the total cost that is mainly composed of the production cost, the penalty of the permanent production loss, and the Work-In-Process(WIP) inventory level cost. Because of the starved and blocked phenomena, the random failures and the maintenances, it is difficult to analyze production system, let alone to find a good control policy. Two reinforcement learning based control decision policies are proposed based on the actions of switching the machines on or off at the start of each time slot. Samples collected from a simulated model are used to obtain two sub-optimal policies named LSPI and T H. TH policy is a simplified form of LSPI, while LSPI performs better in reducing total production cost.
机译:生产系统的实时控制策略对于减少总成本很有吸引力,该总成本主要包括生产成本,永久性生产损失的罚款以及在制品库存水平成本。由于饥饿和阻塞现象,随机故障和维护,很难对生产系统进行分析,更不用说找到一个好的控制策略了。基于在每个时隙开始时打开或关闭机器的动作,提出了两种基于强化学习的控制决策策略。从模拟模型收集的样本用于获得两个名为LSPI和T H的次优策略。TH策略是LSPI的简化形式,而LSPI在降低总生产成本方面表现更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号