Provided are a reinforcement learning device, a reinforcement learning system, an object manipulation device, a model generation method, and a reinforcement learning program, whereby the probability of success of a prescribed manipulation on an object can be increased. This reinforcement learning device has at least one memory and at least one processor, the at least one processor being configured so as to be capable of: inputting information relating to a captured image captured by an imaging device that changes in at least position or orientation thereof, and information relating to a target object image indicating an object to be manipulated by an end effector, to a training model that outputs information for controlling the operation of the end effector; and updating a parameter of the training model on the basis of the result of manipulation of the object for a case where the operation of the end effector is controlled on the basis of the information outputted by the training model.
展开▼