首页> 外文期刊>IEEE Transactions on Systems, Man, and Cybernetics >An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis
【24h】

An MDP Model-Based Reinforcement Learning Approach for Production Station Ramp-Up Optimization: Q-Learning Analysis

机译:基于MDP模型的强化学习平台用于生产站升级优化:Q学习分析

获取原文
获取原文并翻译 | 示例
           

摘要

Ramp-up is a significant bottleneck for the introduction of new or adapted manufacturing systems. The effort and time required to ramp-up a system is largely dependent on the effectiveness of the human decision making process to select the most promising sequence of actions to improve the system to the required level of performance. Although existing work has identified significant factors influencing the effectiveness of ramp-up, little has been done to support the decision making during the process. This paper approaches ramp-up as a sequential adjustment and tuning process that aims to get a manufacturing system to a desirable performance in the fastest possible time. Production stations and machines are the key resources in a manufacturing system. They are often functionally decoupled and can be treated in the first instance as independent ramp-up problems. Hence, this paper focuses on developing a Markov decision process (MDP) model to formalize ramp-up of production stations and enable their formal analysis. The aim is to capture the cause-and-effect relationships between an operator's adaptation or adjustment of a station and the station's response to improve the effectiveness of the process. Reinforcement learning has been identified as a promising approach to learn from ramp-up experience and discover more successful decision-making policies. Batch learning in particular can perform well with little data. This paper investigates the application of a Q-batch learning algorithm combined with an MDP model of the ramp-up process. The approach has been applied to a highly automated production station where several ramp-up processes are carried out. The convergence of the Q-learning algorithm has been analyzed along with the variation of its parameters. Finally, the learned policy has been applied and compared against previous ramp-up cases.
机译:升级是引入新的或改编的制造系统的重大瓶颈。增强系统所需的精力和时间在很大程度上取决于人工决策过程的有效性,该决策过程会选择最有前途的操作序列,以将系统提高到所需的性能水平。尽管现有工作已经确定了影响提升效率的重要因素,但在此过程中几乎没有做任何事情来支持决策。本文将加速作为一种顺序调整和调整过程,旨在使制造系统在尽可能短的时间内达到理想的性能。生产站和机器是制造系统中的关键资源。它们通常在功能上是分离的,并且一开始可以被视为独立的加速问题。因此,本文着重于开发马尔可夫决策过程(MDP)模型,以规范生产站的产能提升并对其进行形式化分析。目的是捕获操作员对站点的适应或调整与站点的响应之间的因果关系,以提高过程的有效性。强化学习已被认为是一种有前途的方法,可以从积累的经验中学习并发现更成功的决策政策。批处理学习尤其可以在数据很少的情况下很好地执行。本文研究了Q批量学习算法与加速过程的MDP模型相结合的应用。该方法已应用到高度自动化的生产站,在该站中执行了多个加速过程。分析了Q学习算法的收敛性及其参数的变化。最后,已应用学习到的策略并将其与以前的案例进行比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号