To develop a supply chain management (SCM) system that performs optimally for both each entity in the chain and the entire chain, a multi-agent reinforcement learning (MARL) technique has been developed. To solve two problems of the MARL for SCM (building a Markov decision processes for a supply chain and avoiding learning stagnation in a way similar to the "prisoner's dilemma"), a learning management method with deep-neural-network (DNN)-weight evolution (LM-DWE) has been developed. By using a beer distribution game (BDG) as an example of a supply chain, experiments with a four-agent system were performed. Consequently, the LM-DWE successfully solved the above two problems and achieved 80.0% lower total cost than expert players of the BDG.
展开▼