【24h】

Manufacturing Dispatching Using Reinforcement and Transfer Learning

机译:使用强化和转移学习进行制造调度

获取原文

摘要

Efficient dispatching rule in manufacturing industry is key to ensure product on-time delivery and minimum past-due and inventory cost. Manufacturing, especially in the developed world, is moving towards on-demand manufacturing meaning a high mix, low volume product mix. This requires efficient dispatching that can work in dynamic and stochastic environments, meaning it allows for quick response to new orders received and can work over a disparate set of shop floor settings. In this paper we address this problem of dispatching in manufacturing. Using reinforcement learning (RL), we propose a new design to formulate the shop floor state as a 2-D matrix, incorporate job slack time into state representation, and design lateness and tardiness rewards function for dispatching purpose. However, maintaining a separate RL model for each production line on a manufacturing shop floor is costly and often infeasi-ble. To address this, we enhance our deep RL model with an approach for dispatching policy transfer. This increases policy generalization and saves time and cost for model training and data collection. Experiments show that: (1) our approach performs the best in terms of total discounted reward and average lateness, tardiness, (2) the proposed policy transfer approach reduces training time and increases policy generalization.
机译:制造业中有效的调度规则是确保产品按时交货以及将逾期和库存成本降至最低的关键。制造业,特别是在发达国家,正朝着按需制造的方向发展,这意味着高混合,小批量产品组合。这需要可以在动态和随机环境中工作的高效调度,这意味着它可以快速响应收到的新订单,并且可以在不同的车间设置中工作。在本文中,我们解决了制造中的调度问题。我们使用强化学习(RL),提出了一种新设计,将车间状态表示为二维矩阵,将工作闲置时间纳入状态表示中,并设计了迟到和迟到的奖励功能来进行调度。但是,在生产车间为每个生产线维护单独的RL模型是昂贵的,而且通常是不可行的。为了解决这个问题,我们使用一种调度策略转移的方法来增强我们的深度RL模型。这增加了策略通用性,并节省了模型训练和数据收集的时间和成本。实验表明:(1)我们的方法在总折扣奖励和平均迟到性,迟到性方面表现最佳;(2)提出的策略转移方法减少了培训时间,并增加了策略概括性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号