首页> 外文学位 >Koolio: Path-planning using reinforcement learning on a real robot in a real environment.
【24h】

Koolio: Path-planning using reinforcement learning on a real robot in a real environment.

机译:Koolio:在真实环境中的真实机器人上使用强化学习进行路径规划。

获取原文
获取原文并翻译 | 示例

摘要

There are many cases where it is not possible to program a robot with precise instructions. The environment may be unknown, or the programmer may not even know the best way in which to solve a problem. In cases such as these, intelligent machine learning is useful in order to provide the robot, or agent, with a policy, a set schema for determining choices based on inputs.;The two primary method groups of machine learning are Supervised Learning, methods by which the supervisor provides training data in order to help the agent learn, and Reinforcement Learning, which requires only a set of rewards for certain choices. Of the three categories of Reinforcement Learning, Dynamic Programming, Monte Carlo, and Temporal Difference, the Temporal Difference method known as Q-Learning was chosen.;Q-Learning is a Markov method which uses a weighted decision table to determine the best choice for any given set of sensor inputs. The values in this Q-table are calculated using the Q-formula, which weighs the expected value of a decision based on the known reward and uses a discounting factor to give more recent choices a greater effect on the values than older choices. The Q-table also allowed the learning to be modular, as a learning agent would only need the file containing the table to be able to use the learned policy generated by a different agent.;Because of the large number of iterations required for Q-Learning to reach an optimal policy, a simulator was required. This simulator provided a means by which the agent could learn behaviors without the need to worry about such things as parts wearing down or an untrained robot colliding with a wall.;After a policy was found in simulation, the Q-table was transferred into Koolio, a refrigerator robot, to allow it to navigate the hallways with the experience gathered in simulation. This Q-table was then further refined through more learning on the real robot.
机译:在许多情况下,无法使用精确的指令对机器人进行编程。环境可能是未知的,或者程序员甚至可能不知道解决问题的最佳方法。在这种情况下,智能机器学习对于为机器人或代理提供策略,用于基于输入确定选择的设定模式非常有用。机器学习的两个主要方法组是监督学习,方法是主管提供培训数据以帮助座席学习,而强化学习则只需要一组奖励就可以进行某些选择。在强化学习,动态规划,蒙特卡洛和时间差异这三类中,选择了称为Q-Learning的时间差异方法.Q-Learning是一种马尔可夫方法,它使用加权决策表来确定最佳选择任何给定的传感器输入集。使用Q公式计算此Q表中的值,该Q公式根据已知的奖励权衡决策的期望值,并使用折现因子使较新的选择对值的影响大于较早的选择。 Q表还允许将学习模块化,因为学习代理仅需要包含该表的文件即可使用不同代理生成的学习策略。由于Q-需要大量迭代在学习达成最佳策略时,需要模拟器。该仿真器提供了一种方法,代理可以通过该方法来学习行为,而无需担心零件磨损或未经训练的机器人撞墙等事情;;在仿真中找到策略后,Q表被转移到Koolio中(一种冰箱机器人),使其能够利用模拟中积累的经验来导航走廊。然后,通过在真实机器人上进行更多的学习,进一步完善此Q表。

著录项

  • 作者

    Zamstein, Lavi Michael.;

  • 作者单位

    University of Florida.;

  • 授予单位 University of Florida.;
  • 学科 Engineering Electronics and Electrical.;Artificial Intelligence.;Engineering Robotics.
  • 学位 Ph.D.
  • 年度 2009
  • 页码 320 p.
  • 总页数 320
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;人工智能理论;
  • 关键词

  • 入库时间 2022-08-17 11:37:37

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号