In this study, downlink scheduling of multiuser traffic with hard deadlines and packet-level priorities is cast as a partially observable Markov decision process. User channels are modeled as Markovian and the base station can learn only the channel condition of the currently scheduled user. The optimization of joint channel learning and scheduling presents the combined challenges incurred by the strict deadline constraint of real-time traffic and the partial observability of multiuser channels. In particular, we show that idling adds a new dimension to the action space; and that, through a case study of heterogeneous multiuser networks, idling is indeed the optimal action under certain system states. This somewhat surprising result reveals the existence of the fundamental tradeoffs between exploitation and exploration/idling, going beyond the classic `exploitation vs exploration'. We find that, due to hard deadlines and packet priorities, idling is intimately related to the tradeoff between the successful transmission of backlogged packets and that of future arrivals. In contrast, for the special case with a symmetric two-user system, we show that the scheduling problem exhibits unique structures, rendering a non-idling greedy policy optimal.
展开▼