We propose a method to speed up reinforcement learning of policies for spoken dialogue systems. This is achieved by learning the value of applying actions in selected states only. The value of unsampled states is approximated by a linear interpolation of known states. Experiments show that the improved algorithm speeds up the learning of dialogue policies.
展开▼