首页> 外文期刊>Robotics and Autonomous Systems >Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks
【24h】

Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks

机译:使用深且经常性神经网络学习人类全身运动和自然语言之间的双向映射

获取原文
获取原文并翻译 | 示例
           

摘要

Linking human whole-body motion and natural language is of great interest for the generation of semantic representations of observed human behaviors as well as for the generation of robot behaviors based on natural language input. While there has been a large body of research in this area, most approaches that exist today require a symbolic representation of motions (e.g. in the form of motion primitives), which have to be defined a-priori or require complex segmentation algorithms. In contrast, recent advances in the field of neural networks and especially deep learning have demonstrated that sub-symbolic representations that can be learfied end-to-end usually outperform more traditional approaches, for applications such as machine translation. In this paper we propose a generative model that learns a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks (RNNs) and sequence-to-sequence learning. Our approach does not require any segmentation or manual feature engineering and learns a distributed representation, which is shared for all motions and descriptions. We evaluate our approach on 2 846 human whole-body motions and 6 187 natural language descriptions thereof from the KIT Motion-Language Dataset. Our results clearly demonstrate the effectiveness of the proposed model: We show that our model generates a wide variety of realistic motions only from descriptions thereof in form of a single sentence. Conversely, our model is also capable of generating correct and detailed natural language descriptions from human motions. (C) 2018 Elsevier B.V. All rights reserved.
机译:连接人类全身运动和自然语言对观察到的人类行为的语义表示以及基于自然语言输入的机器人行为的产生非常令人兴趣。虽然该领域已经存在大的研究,但是今天存在的大多数方法需要运动的象征性(例如,以运动基元的形式),这必须定义一个优先考虑或需要复杂的分段算法。相比之下,神经网络领域以及特别深入学习的最近进步已经证明,可以获得Learfied端到端的亚象征表示通常比机器翻译等应用更优于更加传统的方法。在本文中,我们提出了一种生成模型,该模型学习人类全身运动和自然语言之间的双向映射,使用深度复发性神经网络(RNN)和序列到序列学习。我们的方法不需要任何分段或手动功能工程,并学习分布式表示,这些表示是为所有动作和描述共享。我们评估我们的方法在2 846人的全身运动和6187个自然语言描述中的方法,从套件运动语言数据集。我们的结果清楚地展示了拟议模型的有效性:我们表明我们的模型仅从单句中的描述中产生各种各样的现实运动。相反,我们的模型也能够从人类运动产生正确和详细的自然语言描述。 (c)2018 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号