首页>
外国专利>
Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction
Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction
展开▼
机译:利用非线性降维改进基于奖励的学习的方法和装置
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention is a method and an apparatus for reward-based learning of management policies. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation, thereby producing embedded exemplars, wherein one or more parameters of the NLDR are tuned to minimize a cross-validation Bellman error on a holdout set taken from the set of one or more exemplars. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the embedded exemplars to obtain a learned management policy.
展开▼