首页> 外文会议>International Conference on Machine Learning >Generalization and Exploration via Randomized Value Functions
【24h】

Generalization and Exploration via Randomized Value Functions

机译:随机价值函数的泛化与探索

获取原文

摘要

We propose randomized least-squares value iteration (RLSVI) - a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions. We explain why versions of least-squares value iteration that use Boltzmann or ∈-greedy exploration can be highly inefficient, and we present computational results that demonstrate dramatic efficiency gains enjoyed by RLSVI. Further, we establish an upper bound on the expected regret of RLSVI that demonstrates nearoptimality in a tabula rasa learning context. More broadly, our results suggest that randomized value functions offer a promising approach to tackling a critical challenge in reinforcement learning: synthesizing efficient exploration and effective generalization.
机译:我们提出随机的最小二乘值迭代(RLSVI) - 一种新的加强学习算法,旨在通过线性参数化值函数有效地探索和概括。我们解释为什么使用Boltzmann或贪婪探索的最小二乘价值迭代的版本可能是高效的,我们呈现了展示RLSVI享有的戏剧性效率的计算结果。此外,我们在RLSVI的预期遗憾中建立了一个上限,证明了塔杜拉RAS学习环境中的内容。更广泛地,我们的结果表明随机价值职能提供了一个有希望的方法来解决强化学习中的危急挑战:综合有效的探索和有效泛化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号