首页> 外文期刊>ACM Transactions on Graphics >Learning to Dress: Synthesizing Human Dressing Motion via Deep Reinforcement Learning
【24h】

Learning to Dress: Synthesizing Human Dressing Motion via Deep Reinforcement Learning

机译:学习穿衣:通过深度强化学习合成人类穿衣运动

获取原文
获取原文并翻译 | 示例
       

摘要

Creating animation of a character putting on clothing is challenging due to the complex interactions between the character and the simulated garment. We take a model-free deep reinforcement learning (deepRL) approach to automatically discovering robust dressing control policies represented by neural networks. While deepRL has demonstrated several successes in learning complex motor skills, the data-demanding nature of the learning algorithms is at odds with the computationally costly cloth simulation required by the dressing task. This paper is the first to demonstrate that, with an appropriately designed input state space and a reward function, it is possible to incorporate cloth simulation in the deepRL framework to learn a robust dressing control policy. We introduce a salient representation of haptic information to guide the dressing process and utilize it in the reward function to provide learning signals during training. In order to learn a prolonged sequence of motion involving a diverse set of manipulation skills, such as grasping the edge of the shirt or pulling on a sleeve, we find it necessary to separate the dressing task into several subtasks and learn a control policy for each subtask. We introduce a policy sequencing algorithm that matches the distribution of output states from one task to the input distribution for the next task in the sequence.We have used this approach to produce character controllers for several dressing tasks: putting on a t-shirt, putting on a jacket, and robot-assisted dressing of a sleeve.
机译:由于角色与模拟服装之间复杂的交互作用,因此制作角色服装上的动画非常具有挑战性。我们采用无模型的深度强化学习(deepRL)方法来自动发现由神经网络表示的鲁棒的敷料控制策略。尽管deepRL已证明在学习复杂的运动技能方面取得了一些成功,但是学习算法的数据需求性与修整任务所需的计算量大的布料模拟相矛盾。本文首次证明,通过适当设计的输入状态空间和奖励函数,可以将布料模拟结合到deepRL框架中,以学习可靠的敷料控制策略。我们引入了触觉信息的显着表示形式来指导穿衣过程,并在奖励功能中利用它来提供训练过程中的学习信号。为了学习包括一系列操作技能(例如,抓住衬衫的边缘或拉动袖子)的长时间运动,我们发现有必要将修整任务分为几个子任务,并为每个任务学习控制策略子任务。我们引入了一种策略排序算法,该算法将一个任务的输出状态分布与该序列中下一个任务的输入分布进行匹配,并使用这种方法来为多个修整任务产生角色控制器:穿上T恤,穿上穿在夹克上,然后由机器人协助穿袖。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号