Recently, delayed reinforcement learning (RL) has been proposed as a strong method for learning in multi-agent systems (MASs). In this method, agents are concerned with the problem of discovering an optimal policy, a function mapping states to actions. The most popular RL technique, Q-learning, has been proven to produce an optimal policy under certain conditions. In this paper, we consider a multi-agent cooperation problem, and propose a multi-agent reinforcement learning method based on the other agents' actions. In our learning method, the agent under consideration observes other agents' action, and uses the minimax Q-learning using fuzzy state and fuzzy goal representation for updating fuzzy Q values.
展开▼