首页> 外文会议>Conference on uncertainty in artificial intelligence >Value Function Approximation in Noisy Environments Using Locally Smoothed Regularized Approximate Linear Programs
【24h】

Value Function Approximation in Noisy Environments Using Locally Smoothed Regularized Approximate Linear Programs

机译:使用局部平滑的正则化近似线性程序在嘈杂环境中进行值函数逼近

获取原文

摘要

Recently, Petrik et al. demonstrated that Li-Regularized Approximate Linear Programming (RALP) could produce value functions and policies which compared favorably to established linear value function approximation techniques like LSPI. RALP's success primarily stems from the ability to solve the feature selection and value function approximation steps simultaneously. RALP's performance guarantees become looser if sampled next states are used. For very noisy domains, RALP requires an accurate model rather than samples, which can be unrealistic in some practical scenarios. In this paper, we demonstrate this weakness, and then introduce Locally Smoothed L_1 -Regularized Approximate Linear Programming (LS-RALP). We demonstrate that LS-RALP mitigates inaccuracies stemming from noise even without an accurate model. We show that, given some smoothness assumptions, as the number of samples increases, error from noise approaches zero, and provide experimental examples of LS-RALP's success on common reinforcement learning benchmark problems.
机译:最近,Petrik等人。证明了Li-Regularized近似线性规划(RALP)可以产生值函数和策略,与建立的线性值函数逼近技术(如LSPI)相比具有优势。 RALP的成功主要源于同时解决特征选择和值函数逼近步骤的能力。如果使用采样的下一个状态,则RALP的性能保证会变得更加宽松。对于噪声很大的域,RALP需要一个准确的模型而不是样本,这在某些实际情况下可能是不现实的。在本文中,我们证明了这一弱点,然后介绍了局部平滑的L_1-正则化近似线性规划(LS-RALP)。我们证明了LS-RALP即使没有精确的模型也可以缓解由于噪声引起的误差。我们表明,在给定一些平滑度假设的情况下,随着样本数量的增加,噪声误差接近零,并提供了LS-RALP成功解决常见强化学习基准问题的实验示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号