首页> 外文会议>Annual meeting of the Association for Computational Linguistics >Enlisting the Ghost: Modeling Empty Categories for Machine Translation
【24h】

Enlisting the Ghost: Modeling Empty Categories for Machine Translation

机译:征领幽灵:为机器翻译建模空型

获取原文

摘要

Empty categories (EC) are artificial elements in Penn Treebanks motivated by the government-binding (GB) theory to explain certain language phenomena such as pro-drop. ECs are ubiquitous in languages like Chinese, but they are tacitly ignored in most machine translation (MT) work because of their elusive nature. In this paper we present a comprehensive treatment of ECs by first recovering them with a structured MaxEnt model with a rich set of syntactic and lexical features, and then incorporating the predicted ECs into a Chinese-to-English machine translation task through multiple approaches, including the extraction of EC-specific sparse features. We show that the recovered empty categories not only improve the word alignment quality, but also lead to significant improvements in a large-scale state-of-the-art syntactic MT system.
机译:空的类别(EC)是Penn TreeBanks中的人工元素,受到政府结合(GB)理论的推动,以解释某些语言现象,如潜水。 ECS是普遍存在的语言,如中国人,但由于他们的难以捉摸的性质,在大多数机器翻译(MT)工作中,它们被默默地忽略。在本文中,我们通过使用丰富的句法和词汇特征首次用结构化的最大模型恢复恢复ECS的全面治疗,然后通过多种方法将预测的EC纳入汉英机器翻译任务,包括提取EC特异性稀疏特征。我们表明恢复的空型不仅提高了单词对齐质量,而且导致大规模最先进的句法MT系统的显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号