首页> 外文会议>Machine translation summit >Translator2Vec: Understanding and Representing Human Post-Editors
【24h】

Translator2Vec: Understanding and Representing Human Post-Editors

机译:Translator2vec:了解和代表人工后者

获取原文

摘要

The combination of machines and humans for translation is effective, with many studies showing productivity gains when humans post-edit machine-translated output instead of translating from scratch. To take full advantage of this combination, we need a fine-grained understanding of how human translators work, and which post-editing styles are more effective than others. In this paper, we release and analyze a new dataset with document-level post-editing action sequences, including edit operations from keystrokes, mouse actions, and waiting times. Our dataset comprises 66,268 full document sessions post-edited by 332 humans, the largest of the kind released to date. We show that action sequences are informative enough to identify post-editors accurately, compared to baselines that only look at the initial and final text. We build on this to learn and visualize continuous representations of post-editors, and we show that these representations improve the downstream task of predicting post-editing time.
机译:用于翻译的机器和人类的组合是有效的,许多研究表明人类编辑机器翻译的输出而不是从头转换时的生产率提升。为了充分利用这种组合,我们需要对人类翻译人员的工作以及哪些后编辑方式比其他方式更有效地了解。在本文中,我们释放并分析了一个带有文档级后编辑动作序列的新数据集,包括来自击键,鼠标操作和等待时间的编辑操作。我们的数据集包括332人的66,268个完整的文件会话,这是迄今为止最大的人类。我们表明,与仅查看初始和最终文本的基准相比,动作序列足以准确地识别编辑器准确。我们构建此项来学习和可视化后编辑的持续表示,我们表明这些陈述提高了预测编辑后时间的下游任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号