We consider the problem of dynamic spectrum access for network utilitymaximization in multichannel wireless networks. The shared bandwidth is dividedinto K orthogonal channels. In the beginning of each time slot, each userselects a channel and transmits a packet with a certain attempt probability.After each time slot, each user that has transmitted a packet receives a localobservation indicating whether its packet was successfully delivered or not(i.e., ACK signal). The objective is a multi-user strategy for accessing thespectrum that maximizes a certain network utility in a distributed mannerwithout online coordination or message exchanges between users. Obtaining anoptimal solution for the spectrum access problem is computationally expensivein general due to the large state space and partial observability of thestates. To tackle this problem, we develop a novel distributed dynamic spectrumaccess algorithm based on deep multi-user reinforcement leaning. Specifically,at each time slot, each user maps its current state to spectrum access actionsbased on a trained deep-Q network used to maximize the objective function. Gametheoretic analysis of the system dynamic is developed for establishing designprinciples for the implementation of the algorithm. Experimental resultsdemonstrate strong performance of the algorithm.
展开▼