首页>
外国专利>
SYSTEM AND METHODS FOR INTRINSIC REWARD REINFORCEMENT LEARNING
SYSTEM AND METHODS FOR INTRINSIC REWARD REINFORCEMENT LEARNING
展开▼
机译:内部奖励补习的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
A learning agent is disclosed that receives data in sequence from one or more sequential data sources; generates a model modelling sequences of data and actions; and selects an action maximizing the expected future value of a reward function, wherein the reward function depends at least partly on at least one of: a measure of the change in complexity of the model, or a measure of the complexity of the change in the model. The measure of the change in complexity of the model may be based on, for example, the change in description length of the first part of a two-part code describing one or more sequences of received data and actions, the change in description length of a statistical distribution modelling, the description length of the change in the first part of the two-part code, or the description length of the change in the statistical distribution modelling.
展开▼