首页>
外国专利>
Method and apparatus for improved reward-based learning using adaptive distance metrics
Method and apparatus for improved reward-based learning using adaptive distance metrics
展开▼
机译:使用自适应距离量度改进基于奖励的学习的方法和装置
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention is a method and an apparatus for reward-based learning of policies for managing or controlling a system or plant. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance metric and a distance-based function approximator estimating long-range expected value are then initialized, where the distance metric computes a distance between two (state, action) pairs, and the distance metric and function approximator are adjusted such that a Bellman error measure of the function approximator on the set of exemplars is minimized. A management policy is then derived based on the trained distance metric and function approximator.
展开▼