A machine learning device for learning a motion of a robot engaged in a task performed by a human and a robot in cooperation with each other, including a state observation unit that observes a state variable indicating a state of the robot when the human and the robot cooperate with each other and perform a task; a reward calculation unit that calculates a reward based on control data and the state variable for controlling the robot and on an action of the human; and a value function update unit that updates an action value function for controlling a motion of the robot, based on the reward and the state variable.
展开▼