The present invention is a multi-agent multi-slot machine of reinforcement learning, a method of machine learning, when N (N>1) nodes try to transmit a data frame at the same time in a wireless communication network. Using the Bandit, MAB) algorithm, each node divides the time each node attempts to transmit in a time synchronization system consisting of time slots having a certain size, and learns an optimal method that does not collide with each other It is a random access protocol.
展开▼