首页> 外文会议>International Conference on Artificial Intelligence and Mechatronics Systems >A Comparative Analysis of Multiple Biasing Techniques for Q_biased Softmax Regression Algorithm
【24h】

A Comparative Analysis of Multiple Biasing Techniques for Q_biased Softmax Regression Algorithm

机译:Q_Biased Softmax回归算法多偏置技术的比较分析

获取原文

摘要

Over the past many years the popularity of robotic workers has seen a tremendous surge. Several tasks which were previously considered insurmountable are able to be performed by robots efficiently, with much ease. This is mainly due to the advances made in the field of control systems and artificial intelligence in recent years. Lately, we have seen Reinforcement Learning (RL) capture the spotlight, in the field of robotics. Instead of explicitly specifying the solution of a particular task, RL enables the robot (agent) to explore its environment and through trial and error choose the appropriate response. In this paper, a comparative analysis of biasing techniques for the Q-biased softmax regression (QBIASSR) algorithm has been presented. In QBIASSR, decision-making for un-explored states depends upon the set of previously explored states. This algorithm improves the learning process when the robot reaches unexplored states. A vector bias(s) is calculated on the basis of variable values of experienced states and added to the Q-value function for action selection. To obtain the optimized reward, different techniques to calculate bias(s) are adopted. The performance of all the techniques has been evaluated and compared for obstacle avoidance in the case of a mobile robot. In the end, we have demonstrated that the cumulative reward generated by the technique proposed in our paper is at least 2 times greater than the baseline.
机译:在过去的多年中,机器人工人的普及都看到了巨大的浪涌。以前被认为是不可逾越的几个任务,能够有效地通过机器人进行,很放松。这主要是由于近年来控制系统领域和人工智能领域所取得的进展。最近,我们已经看到了强化学习(RL)捕获了机器人领域的聚光灯。 RL而不是明确地指定特定任务的解决方案,而是通过试验和错误选择适当的响应,而不是明确地指定特定任务的解决方案。本文介绍了Q偏置软态回归(Qbiassr)算法的偏置技术的比较分析。在Qbiassr中,针对未探索国家的决策取决于先前探索的国家。当机器人到达未开发的状态时,该算法可以提高学习过程。根据经验丰富的状态的变量值计算矢量偏压,并添加到Q值函数以进行动作选择。为了获得优化的奖励,采用了计算偏差的不同技术。在移动机器人的情况下,已经评估和比较了所有技术的性能。最后,我们已经证明,我们纸上提出的技术产生的累积奖励至少比基线大的2倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号