An information processing apparatus that optimizes a policy in a transition model in which the number of targeted objects in each state transits according to the policy includes a cost constraint acquisition unit configured to acquire a cost constraint that constrains a total cost of the policy; a mass policy setting unit configured to set the number of objects targeted by a mass policy in each state, based on the predefined number of objects to belong to each state and a reach rate at which the mass policy reaches to an object, with respect to the mass policy collectively executed for the object in two or more states; and a processing unit configured to assume the reach rate of the mass policy as a variable of an optimization and maximize an objective function based on a total reward in a whole period while satisfying the cost constraint.
展开▼