This paper relates to reinforcement learning.The method isMaintaining individual episode memory data for each of a plurality of actions andReceiving the current observation that characterizes the current state of the environment being interacted by the agent andTo generate the current key embedding on the current observationThe present observation is performed using an embedded neural network according to the current value of the parameters of the embedded neural network.Each action of multiple actionsAccording to the distance measurementFor current key embeddingDetermining p-neighbor key embedding in episodic data for action andFrom return estimates mapped by p-neighbor key embedding in episodic data for actionDetermining the Q value for behavior andUse the Q value for behaviorChoose actions from multiple actions as actions to be performed by an agent.Diagram
展开▼