In an approach, a processor trains a model, via a reinforcement learning process, to produce a first action function for relating states of a natural language based response environment to actions applicable to the natural language based response environment. A processor retrains the model, via the reinforcement learning process, to produce a second action function, including iterations of: applying the first action function to a current state representation of the natural language based response environment to obtain a ground-truth action representation, emphasizing a word of the current state representation based on relevancy to the ground-truth action representation to obtain a modified state representation, applying a model to the modified state representation to obtain an untrained action representation, and submitting the untrained action representation to a natural language based response environment to obtain a subsequent state representation, where the subsequent state representation becomes the current state representation for a subsequent iteration.
展开▼