首页> 外国专利> Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction

Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction

机译：利用非线性降维改进基于奖励的学习的方法和装置

页面导航

摘要
著录项
相似文献

摘要

The present invention is a method and an apparatus for reward-based learning of management policies. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation, thereby producing embedded exemplars, wherein one or more parameters of the NLDR are tuned to minimize a cross-validation Bellman error on a holdout set taken from the set of one or more exemplars. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the embedded exemplars to obtain a learned management policy.

机译：本发明是用于基于奖励的管理策略的学习的方法和设备。在一个实施例中，一种用于基于奖励的学习的方法包括：接收一组一个或多个示例，其中至少两个示例包括系统的（状态，动作）对，并且至少一个示例包括立即数。响应（状态，动作）对的奖励。成对的样本之间的距离度量用于计算（状态，动作）对的非线性降维（NLDR）映射到较低维表示，从而生成嵌入式样本，其中NLDR的一个或多个参数被调整最小化从一个或多个示例集合中获取的保留集合上的交叉验证Bellman错误。然后将映射应用于示例集，并将基于奖励的学习应用于嵌入式示例以获得学习的管理策略。

著录项

公开/公告号US8060454B2

专利类型
公开/公告日2011-11-15

原文格式PDF
申请/专利权人 RAJARSHI DAS;GERALD J. TESAURO;KILIAN Q. WEINBERGER;
展开▼

申请/专利号US20070870698
发明设计人 RAJARSHI DAS;GERALD J. TESAURO;KILIAN Q. WEINBERGER;
展开▼

申请日2007-10-11
分类号G06F15/18;
国家 US
入库时间 2022-08-21 17:26:41

相似文献

专利
外文文献
中文文献