Reinforcement Learning Based Decision Making of Operational Indices in Process Industry Under Changing Environment

Liu Chao; Ding Jinliang; Sun Jiyuan

首页> 外文期刊>IEEE transactions on industrial informatics >Reinforcement Learning Based Decision Making of Operational Indices in Process Industry Under Changing Environment

【24h】

Reinforcement Learning Based Decision Making of Operational Indices in Process Industry Under Changing Environment

机译：改变环境下工艺行业运营指标的加固学习决策

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The plant-wide production process is composed of multiple unit processes, in which the operational indices of each unit process are assigned and adjusted according to product quality, yield, and actual operating modes. Due to the changing operational conditions of the production process, the operational indices cannot be effectively adjusted by most of the model-based methods or evolutionary computation. In this article, the decision making of operational indices is formulated as a continuous state, continuous action reinforcement learning (RL) problem and a model-free RL algorithm is proposed, which learns a decision policy to determine the operational indices according to the actual operational conditions. Different from the existing methods, this article presents a multiactor networks ensemble algorithm and an actor-critic framework with stochastic policy to avoid falling into local optimums. The relatively overall optimal policy is obtained by extracting the results of parallel training of multiactor networks, which guarantees the optimality of the obtained policy. In addition, by using the experience replay, it is particularly valuable to effectively deal with the problem that lacking of sampling data in the model-free RL. Simulation studies are conducted on actual data of a mineral processing plant and the results demonstrate the effectiveness of the proposed algorithm.

机译：省级的生产过程由多个单元过程组成，其中每个单元过程的操作指标根据产品质量，产量和实际操作模式分配和调整。由于生产过程的运行条件不变，大多数基于模型的方法或进化计算无法有效地调整操作指标。在本文中，提出了连续的行动索引的决策制定作为连续的状态，连续动作强化学习（RL）问题和无模型的RL算法，其学习根据实际操作确定操作指数的决策策略使适应。与现有方法不同，本文介绍了一个多视科网络集合算法和带有随机策略的演员 - 评论家框架，以避免陷入本地最佳策略。通过提取多功能网络的并行培训结果来获得相对整体的最佳策略，这保证了所获得的政策的最优性。此外，通过使用重播体验，可以有效地处理缺少无模型RL中的采样数据的问题特别有价值。仿真研究是在矿物加工厂的实际数据上进行的，结果证明了所提出的算法的有效性。

著录项

来源
《IEEE transactions on industrial informatics》 |2021年第4期|2727-2736|共10页
作者
Liu Chao; Ding Jinliang; Sun Jiyuan;
展开▼
作者单位

Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China;

Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China;

Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Actor-critic (AC) algorithm; experience replay; multiactor networks ensemble (MAE); operational indices; process industry; reinforcement learning (RL);

机译：演员 - 评论家（AC）算法;体验重放;多学科网络合奏（MAE）;运营指数;工艺行业;加固学习（RL）;

相似文献

外文文献
中文文献
专利

1. A Finite Horizon Markov Decision Process Based Reinforcement Learning Control of a Rapid Thermal Processing system [J] . Pradeep D. John, Noel Mathew Mithra Journal of Process Control . 2018,第期

机译：基于有限的地平线马尔可夫决策过程的快速热处理系统的加固学习控制
2. A Reinforcement Learning-Based Markov-Decision Process (MDP) Implementation for SRAM FPGAs [J] . Ruan Aiwu, Shi Aokai, Qin Liang, Circuits and Systems II: Express Briefs, IEEE Transactions on . 2020,第10期

机译：SRAM FPGA的基于加强学习的Markov决策过程（MDP）实施
3. Task Offloading with Power Control for Mobile Edge Computing Using Reinforcement Learning-Based Markov Decision Process [J] . Bingxin Zhang, Guopeng Zhang, Weice Sun, Mobile information systems . 2020,第1期

机译：采用基于强化学习的马尔可夫决策过程卸载移动边缘计算功率控制的任务
4. Autonomous Driving using Safe Reinforcement Learning by Incorporating a Regret-based Human Lane-Changing Decision Model [C] . Dong Chen, Longsheng Jiang, Yue Wang, Annual American Control Conference . 2020

机译：通过结合基于遗憾的人行道变更决策模型，使用安全强化学习进行自动驾驶
5. A New Reinforcement Learning Algorithm with Fixed Exploration for Semi-Markov Decision Processes [D] . Encapera, Angelo Michael. 2017

机译：半马尔可夫决策过程的固定探索新强化学习算法
6. An extended reinforcement learning model of basal ganglia to understand the contributions of serotonin and dopamine in risk-based decision making reward prediction and punishment learning [O] . Pragathi P. Balasubramani, V. Srinivasa Chakravarthy, Balaraman Ravindran, 2014

机译：扩展的基底神经节强化学习模型以了解5-羟色胺和多巴胺在基于风险的决策奖励预测和惩罚学习中的作用
7. Automated Lane Change Decision Making using Deep Reinforcement Learning in Dynamic and Uncertain Highway Environment [O] . Ali Alizadeh, Majid Moghadam, Yunus Bicer, 2019

机译：自动化车道改变决策在动态和不确定公路环境中使用深度加固学习

Reinforcement Learning Based Decision Making of Operational Indices in Process Industry Under Changing Environment

摘要

著录项

相似文献

相关主题

期刊订阅