首页> 外文期刊>Fuzzy sets and systems >Exploration and exploitation balance management in fuzzy reinforcement learning
【24h】

Exploration and exploitation balance management in fuzzy reinforcement learning

机译:模糊强化学习中的勘探与开发平衡管理

获取原文
获取原文并翻译 | 示例
       

摘要

This paper offers a fuzzy balance management scheme between exploration and exploitation, which can be implemented in any critic-only fuzzy reinforcement learning method. The paper, however, focuses on a newly developed continuous reinforcement learning method, called fuzzy Sarsa learning (FSL) due to its advantages. Establishing balance greatly depends on the accuracy of action value function approximation. At first, the overfitting problem in approximating action value function in continuous reinforcement learning algorithms is discussed, and a new adaptive learning rale is proposed to prevent this problem. By relating the learning rate to the inverse of "fuzzy visit value" of the current state, the training data set is forced to have uniform effect on the weight parameters of the approximator and hence overfitting is resolved. Then, a fuzzy balancer is introduced to balance exploration vs. exploitation by generating a suitable temperature factor for the Softmax formula. Finally, an enhanced FSL (EFSL) is offered by integrating the proposed adaptive learning rate and the fuzzy balancer into FSL. Simulation results show that EFSL eliminates overfitting, well manages balance, and outperforms FSL in terms of learning speed and action quality.
机译:本文提供了一种在勘探与开发之间的模糊平衡管理方案,该方案可以在任何仅批评者的模糊强化学习方法中实施。然而,本文着眼于一种新开发的连续强化学习方法,由于其优势,该方法称为模糊Sarsa学习(FSL)。建立平衡在很大程度上取决于动作值函数逼近的准确性。首先,讨论了在连续强化学习算法中逼近作用值函数的过拟合问题,并提出了一种新的自适应学习规则来防止该问题。通过将学习速率与当前状态的“模糊访问值”的倒数相关联,迫使训练数据集对逼近器的权重参数产生统一影响,从而解决了过拟合问题。然后,引入模糊平衡器,通过为Softmax公式生成合适的温度因子来平衡勘探与开采之间的关系。最后,通过将建议的自适应学习率和模糊均衡器集成到FSL中,可以提供增强的FSL(EFSL)。仿真结果表明,EFSL在学习速度和动作质量方面,消除了过拟合,良好的平衡管理并优于FSL。

著录项

  • 来源
    《Fuzzy sets and systems》 |2010年第4期|578-595|共18页
  • 作者单位

    Intelligent Control Systems Laboratory, School of Electrical Engineering, Tarbiat Modules University, P.O. Box 14115-143, Tehran, Iran;

    Intelligent Control Systems Laboratory, School of Electrical Engineering, Tarbiat Modules University, P.O. Box 14115-143, Tehran, Iran;

    Control and Intelligent Processing Center of Excellence, University of Tehran, Tehran, Iran School of Cognitive Science, Institute for Research in Fundamental Sciences, Tehran, Iran;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    reinforcement learning: decision analysis; fuzzy control; exploration; exploitation;

    机译:强化学习:决策分析;模糊控制勘探;开发;
  • 入库时间 2022-08-18 02:59:18

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号