首页> 外文期刊>Applied Soft Computing >Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning
【24h】

Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning

机译:通过深度加固学习的新工作店的灵活作业商店动态调度

获取原文
获取原文并翻译 | 示例
           

摘要

In modern manufacturing industry, dynamic scheduling methods are urgently needed with the sharp increase of uncertainty and complexity in production process. To this end, this paper addresses the dynamic flexible job shop scheduling problem (DFJSP) under new job insertions aiming at minimizing the total tardiness. Without lose of generality, the DFJSP can be modeled as a Markov decision process (MDP) where an intelligent agent should successively determine which operation to process next and which machine to assign it on according to the production status of current decision point, making it particularly feasible to be solved by reinforcement learning (RL) methods. In order to cope with continuous production states and learn the most suitable action (i.e. dispatching rule) at each rescheduling point, a deep Q-network (DQN) is developed to address this problem. Six composite dispatching rules are proposed to simultaneously select an operation and assign it on a feasible machine every time an operation is completed or a new job arrives. Seven generic state features are extracted to represent the production status at a rescheduling point. By taking the continuous state features as input to the DQN, the state-action value (Q-value) of each dispatching rule can be obtained. The proposed DQN is trained using deep Q-learning (DQL) enhanced by two improvements namely double DQN and soft target weight update. Moreover, a "softmax" action selection policy is utilized in real implementation of the trained DQN so as to promote the rules with higher Q-values while maintaining the policy entropy. Numerical experiments are conducted on a large number of instances with different production configurations. The results have confirmed both the superiority and generality of DQN compared to each composite rule, other well-known dispatching rules as well as the stand Q-learning-based agent. (C) 2020 Elsevier B.V. All rights reserved.
机译:在现代制造业中,迫切需要动态调度方法,生产过程中不确定性和复杂性的急剧增加。为此,本文在新的作业插入下解决了动态灵活的作业商店调度问题(DFJSP),旨在最大限度地减少总衰退。如果没有vf,dfjsp可以被建模为Markov决策过程(MDP),其中智能代理应连续确定要处理的哪个操作,并且根据当前决策点的生产状态将其分配了哪台机器,使其尤其如此通过加固学习(RL)方法可以解决。为了应对连续生产状态,并在每个重新安排点中学习最合适的动作(即调度规则),开发了一个深Q网络(DQN)来解决这个问题。提出了六种复合调度规则以同时选择操作,并每次操作完成或新作业到达时将其分配在可行的计算机上。提取七个通用状态功能以表示重新安排点处的生产状态。通过将作为DQN输入的输入的连续状态特征,可以获得每个调度规则的状态动作值(Q值)。所提出的DQN使用深度Q-Learning(DQL)培训,通过两个改进增强,即双DQN和软目标权重更新。此外,在训练的DQN的实际实现中使用“Softmax”动作选择策略,以便在维护策略熵的同时在具有更高Q值的规则中促进规则。数值实验在具有不同生产配置的大量实例上进行。结果证实了与每个复合规则相比,其他众所周知的调度规则以及站式Q学习的代理相比,DQN的优势和一般性。 (c)2020 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号