首页> 外文会议>IEEE International Conference on Real-time Computing and Robotics >Towards High Level Skill Learning: Learn to Return Table Tennis Ball Using Monte-Carlo Based Policy Gradient Method
【24h】

Towards High Level Skill Learning: Learn to Return Table Tennis Ball Using Monte-Carlo Based Policy Gradient Method

机译:迈向高水平技能学习:使用基于蒙特卡洛的政策梯度法学习归还乒乓球

获取原文

摘要

Deep learning has achieved a great success in both visual and acoustic recognition and classification tasks. The accuracy of many state-of-the-art methods have surpassed that of human beings. However, in the field of robotics, it remains to be a big challenge for a real robot to master a high-level skill using deep learning methods, even though human can easily learn the task from demonstration, imitation and practice. Compared to Go and Atari games, this kind of tasks is usually continuous in both state space and action space, which makes value based reinforcement learning methods unavailable. Making a robot learn to return a ball to a desired point in table tennis is such a typical task. It would be a promising step if a robot can learn to play table tennis without the exact knowledge of the models in this sport just as human players do. In this paper, we consider such a kind of motion decision skill learning, a one-step decision making process, and give a Monte-Carlo based reinforcement learning method in the framework of Deep Deterministic Policy Gradient. Then we apply this method in robotic table tennis and test it on two tasks. The first one is to return balls to a desired point first, and the second one is to return balls to randomly selected landing points. The experimental results demonstrate that the trained policy can successfully return balls of random motion state to both a designated point and randomly selected landing points with high accuracy.
机译:深度学习在视觉和听觉识别以及分类任务方面都取得了巨大的成功。许多最先进的方法的准确性已经超过了人类。然而,在机器人技术领域,即使人类可以通过演示,模仿和练习轻松地学习任务,对于真正的机器人来说,使用深度学习方法来掌握高级技能仍然是一个巨大的挑战。与Go和Atari游戏相比,这种任务通常在状态空间和动作空间都是连续的,这使得基于价值的强化学习方法不可用。使机器人学会将球传回乒乓球中的期望点是这样的典型任务。如果机器人能够像人类运动员一样,在不完全了解这项运动的模型的情况下学习打乒乓球,那将是有希望的一步。在本文中,我们考虑了这种运动决策技能学习,一步一步的决策过程,并在深度确定性策略梯度框架内给出了基于蒙特卡洛的强化学习方法。然后,我们将此方法应用于自动乒乓球并在两个任务上对其进行测试。第一个是首先将球返回到期望的点,第二个是将球返回到随机选择的着陆点。实验结果表明,经过训练的策略可以成功地将随机运动状态的球成功地返回到指定点和随机选择的着陆点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号