首页> 外文期刊>International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms >Directed Policy Search for Decision Making Using Relevance Vector Machines
【24h】

Directed Policy Search for Decision Making Using Relevance Vector Machines

机译:相关向量机决策的定向策略搜索

获取原文
获取原文并翻译 | 示例
           

摘要

Several recent learning approaches in decision making under uncertainty suggest the use of classifiers for representing policies compactly. The space of possible policies, even under such structured representations, is huge and must be searched carefully to avoid computationally expensive policy simulations (rollouts). In our recent work, we proposed a method for directed exploration of policy space using support vector classifiers, whereby rollouts are directed to states around the boundaries between different action choices indicated by the separating hyperplanes in the represented policies. While effective, this method suffers from the growing number of support vectors in the underlying classifiers as the number of training examples increases. In this paper, we propose an alternative method for directed policy search based on relevance vector machines. Relevance vector machines are used both for classification (to represent a policy) and regression (to approximate the corresponding relative action advantage function). Classification is enhanced by anomaly detection for accurate policy representation. Exploiting the internal structure of the regressor, we guide the probing of the state space only to critical areas corresponding to changes of action dominance in the underlying policy. This directed focus on critical parts of the state space iteratively leads to refinement and improvement of the underlying policy and delivers excellent control policies in only a few iterations, while the small number of relevance vectors yields significant computational time savings. We demonstrate the proposed approach and compare it with our previous method on standard reinforcement learning domains (inverted pendulum and mountain car).
机译:不确定条件下决策中的几种最新学习方法建议使用分类器来紧凑地表示政策。即使在这样的结构化表示形式下,可能的策略的空间也是巨大的,必须仔细搜索以避免计算量大的策略模拟(推广)。在我们最近的工作中,我们提出了一种使用支持​​向量分类器对策略空间进行定向探索的方法,其中,将卷展定向到代表策略中由分离的超平面指示的不同操作选择之间的边界周围的状态。尽管有效,但随着训练示例数量的增加,该方法在基础分类器中的支持向量越来越多。在本文中,我们提出了一种基于相关向量机的定向策略搜索的替代方法。相关性向量机既用于分类(代表策略),又用于回归(近似对应的相对行动优势函数)。通过异常检测增强分类,以实现准确的策略表示。利用回归器的内部结构,我们将状态空间的探测仅引导到与基础策略中的操作主导权变化相对应的关键区域。这种直接关注状态空间关键部分的方法可以迭代地改进和改进基础策略,并仅需几次迭代就可以提供出色的控制策略,而少量的相关向量可以节省大量的计算时间。我们演示了提出的方法,并将其与我们先前在标准强化学习领域(倒立摆和山地车)上的方法进行了比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号