...
【24h】

Asynchronous action-reward learning for nonstationary serial supply chain inventory control

机译:用于非平稳串行供应链库存控制的异步行动奖励学习

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Action-reward learning is a reinforcement learning method. In this machine learning approach, an agent interacts with non-deterministic control domain. The agent selects actions at decision epochs and the control domain gives rise to rewards with which the performance measures of the actions are updated. The objective of the agent is to select the future best actions based on the updated performance measures. In this paper, we develop an asynchronous action-reward learning model which updates the performance measures of actions faster than conventional action-reward learning. This learning model is suitable to apply to nonstationary control domain where the rewards for actions vary over time. Based on the asynchronous action-reward learning, two situation reactive inventory control models (centralized and decentralized models) are proposed for a two-stage serial supply chain with nonstationary customer demand. A simulation based experiment was performed to evaluate the performance of the proposed two models.
机译:行动奖励学习是一种强化学习方法。在这种机器学习方法中,代理与非确定性控制域进行交互。代理在决策时期选择动作,并且控制域产生奖励,通过奖励来更新动作的绩效度量。代理的目标是根据更新的绩效指标选择未来的最佳措施。在本文中,我们开发了一个异步行动奖励学习模型,该模型比传统的行动奖励学习更快地更新了行动的绩效指标。这种学习模型适合应用于非平稳控制领域,在该领域中,行动的奖励会随着时间而变化。基于异步行动奖励学习,针对具有非固定客户需求的两阶段串行供应链,提出了两种情况的反应性库存控制模型(集中式和分散式模型)。进行了基于仿真的实验,以评估所提出的两个模型的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号