首页> 外文会议>International Conference on Robotics and Automation >Learning to Write Anywhere with Spatial Transformer Image-to-Motion Encoder-Decoder Networks
【24h】

Learning to Write Anywhere with Spatial Transformer Image-to-Motion Encoder-Decoder Networks

机译:学习使用空间变压器图像运动编码器/解码器网络编写任何地方

获取原文

摘要

Learning to recognize and reproduce handwritten characters is already a challenging task both for humans and robots alike, but learning to do the same thing for characters that can be transformed arbitrarily in space, as humans do when writing on a blackboard for instance, significantly ups the ante from a robot vision and control perspective. In previous work we proposed various different forms of encoder-decoder networks that were capable of mapping raw images of digits to dynamic movement primitives (DMPs) such that a robot could learn to translate the digit images into motion trajectories in order to reproduce them in written form. However, even with the addition of convolutional layers in the image encoder, the extent to which these networks are spatially invariant or equivariant is rather limited. In this paper, we propose a new architecture that incorporates both an image-to-motion encoder-decoder and a spatial transformer in a fully differentiable overall network that learns to rectify affine transformed digits in input images into canonical forms, before converting them into DMPs with accompanying motion trajectories that are finally transformed back to match up with the original digit drawings such that a robot can write them in their original forms. We present experiments with various challenging datasets that demonstrate the superiority of the new architecture compared to our previous work and demonstrate its use with a humanoid robot in a real writing task.
机译:学会识别和复制手写字符对于人类和机器人而言已经是一项艰巨的任务,但是学会对可以在空间中任意变换的字符做同样的事情,就像人类在黑板上写字时所做的那样,极大地提高了手写字符的可读性。从机器人视觉和控制的角度出发。在以前的工作中,我们提出了各种不同形式的编码器/解码器网络,它们能够将数字的原始图像映射到动态运动原语(DMP),以便机器人可以学会将数字图像转换为运动轨迹,以便以书面形式复制它们。形成。但是,即使在图像编码器中增加了卷积层,这些网络在空间上不变或等变的程度也受到相当大的限制。在本文中,我们提出了一种新架构,该架构在完全可区分的整个网络中结合了图像运动编码器-解码器和空间转换器,该网络学会了将输入图像中的仿射变换数字转换为规范形式,然后再将其转换为DMP。伴随着运动轨迹,这些运动轨迹最终被转换回与原始数字图形相匹配的位置,以便机器人可以以原始形式书写它们。我们提供了具有挑战性的各种数据集的实验,这些数据集展示了新架构与我们以前的工作相比的优越性,并展示了其在实际写作任务中与人形机器人的配合使用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号