首页> 外国专利> Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction

Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction

机译:利用非线性降维改进基于奖励的学习的方法和装置

摘要

The present invention is a method and an apparatus for reward-based learning of management policies. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation, thereby producing embedded exemplars, wherein one or more parameters of the NLDR are tuned to minimize a cross-validation Bellman error on a holdout set taken from the set of one or more exemplars. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the embedded exemplars to obtain a learned management policy.
机译:本发明是用于基于奖励的管理策略的学习的方法和设备。在一个实施例中,一种用于基于奖励的学习的方法包括:接收一组一个或多个示例,其中至少两个示例包括系统的(状态,动作)对,并且至少一个示例包括立即数。响应(状态,动作)对的奖励。成对的样本之间的距离度量用于计算(状态,动作)对的非线性降维(NLDR)映射到较低维表示,从而生成嵌入式样本,其中NLDR的一个或多个参数被调整最小化从一个或多个示例集合中获取的保留集合上的交叉验证Bellman错误。然后将映射应用于示例集,并将基于奖励的学习应用于嵌入式示例以获得学习的管理策略。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号