首页> 外文会议>International conference on autonomous agents and multiagent systems;AAMAS 2011 >Basis Function Discovery using Spectral Clustering and Bisimulation Metrics(Extended Abstract)
【24h】

Basis Function Discovery using Spectral Clustering and Bisimulation Metrics(Extended Abstract)

机译:使用谱聚类和双仿真度量的基础函数发现(扩展摘要)

获取原文

摘要

Markov Decision Processes (MDPs) are a powerful framework for modeling sequential decision making for intelligent agents acting in stochastic environments. One of the important challenges facing such agents in practical applications is finding a suitable way to represent the state space, so that a good way of behaving can be learned efficiently. In this paper, we focus on learning a good policy when function approximation must be used to represent the value function. In this case, states are mapped into feature vectors, and a set of parameters is learned, which allows us to approximate the value of any given state. Theoretically, the quality of the approximation that can be obtained depends on the set of features. In practice, the feature set affects not only the quality of the solution obtained, but also the speed of learning.
机译:马尔可夫决策过程(MDP)是一个强大的框架,可为在随机环境中行动的智能主体建模顺序决策。在实际应用中,此类代理面临的重要挑战之一是找到一种合适的方式来表示状态空间,以便可以有效地学习良好的行为方式。在本文中,我们将重点研究在必须使用函数逼近来表示值函数时的良好策略。在这种情况下,状态被映射到特征向量中,并且学习了一组参数,这使我们能够近似任何给定状态的值。从理论上讲,可获得的近似值的质量取决于特征集。实际上,功能集不仅影响获得的解决方案的质量,而且影响学习的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号