首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation
【24h】

Sign Language Transformers: Joint End-to-End Sign Language Recognition and Translation

机译:手语变形金刚:端到端联合手语识别和翻译

获取原文

摘要

Prior work on Sign Language Translation has shown that having a mid-level sign gloss representation (effectively recognizing the individual signs) improves the translation performance drastically. In fact, the current state-of-the-art in translation requires gloss level tokenization in order to work. We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation while being trainable in an end-to-end manner. This is achieved by using a Connectionist Temporal Classification (CTC) loss to bind the recognition and translation problems into a single unified architecture. This joint approach does not require any ground-truth timing information, simultaneously solving two co-dependant sequence-to-sequence learning problems and leads to significant performance gains. We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T (PHOENIX14T) dataset. We report state-of-the-art sign language recognition and translation results achieved by our Sign Language Transformers. Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models, in some cases more than doubling the performance (9.58 vs. 21.80 BLEU-4 Score). We also share new baseline translation results using transformer networks for several other text-to-text sign language translation tasks.
机译:先前关于手语翻译的工作表明,具有中等级别的手语光泽表示(有效识别单个手语)可以显着提高翻译性能。实际上,当前的最新翻译要求使用光泽度级别标记化才能起作用。我们介绍了一种基于变压器的新型架构,该架构可共同学习连续手语识别和翻译,同时可进行端到端的培训。这是通过使用Connectionist Temporal分类(CTC)丢失来将识别和转换问题绑定到一个统一的体系结构中来实现的。这种联合方法不需要任何真实的时序信息,可以同时解决两个相互依赖的逐序列学习问题,并可以显着提高性能。我们在具有挑战性的RWTH-PHOENIX-Weather-2014T(PHOENIX14T)数据集上评估了我们的方法的识别和翻译性能。我们报告了我们的手语变形金刚取得的最新手语识别和翻译结果。我们的翻译网络在手势视频和口语翻译模型方面均优于手语视频,在口语翻译模型方面也优于光泽度,在某些情况下,其性能要高出一倍以上(9.58比21.80 BLEU-4得分)。我们还将使用变压器网络共享其他一些文本到文本手语翻译任务的新基线翻译结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号