This behavior learning device has: a behavior candidate acquisition unit that extracts a plurality of available behavior candidates on the basis of situation information data indicating the environment and own situation; a score acquisition unit that acquires a score, which is an index indicating the predicted effect of the result of a behavior, for each of the plurality of behavior candidates; a behavior selection unit that selects, from among the plurality of behavior candidates, the behavior candidate having the highest score; and a score adjustment unit that adjusts the value of the score associated with the selected behavior candidate on the basis of the results of conducting the selected behavior in the environment.
展开▼