Augmenting Statistical Machine Translation (SMT) systems with syntactic information aims at improving translation quality. Hierarchical Phrase-Based (HPB) SMT takes a step toward incorporating syntax in Phrase-Based (PB) SMT by modelling one aspect of language syntax, namely the hierarchical structure of phrases. Syntax Augmented Machine Translation (SAMT) further incorporates syntactic information extracted using context free phrase structure grammar (CF-PSG) in the HPB SMT model. One of the main challenges facing CF-PSG-based augmentation approaches for SMT systems emerges from the difference in the definition of the constituent in CF-PSG and the ‘phrase’ in SMT systems, which hinders the ability of CF-PSG to express the syntactic function of many SMT phrases. Although the SAMT approach to solving this problem using ‘CCG-like’ operators to combine constituent labels improves syntactic constraint coverage, it significantly increases their sparsity, which restricts translation and negatively affects its quality.ududIn this thesis, we address the problems of sparsity and limited coverage of syntactic constraints facing the CF-PSG-based syntax augmentation approaches for HPB SMT using Combinatory Cateogiral Grammar (CCG). We demonstrate thatudCCG’s flexible structures and rich syntactic descriptors help to extract richer, more expressive and less sparse syntactic constraints with better coverage than CF-PSG,udwhich enables our CCG-augmented HPB system to outperform the SAMT system. We also try to soften the syntactic constraints imposed by CCG category nonterminal labels by extracting less fine-grained CCG-based labels. We demonstrate that CCG label simplification helps to significantly improve the performance of our CCG category HPB system. Finally, we identify the factors which limit the coverage of the syntactic constraints in our CCG-augmented HPB model. We then try to tackle these factors by extending the definition of the nonterminal label to be composed of a sequence of CCG categories and augmenting the glue grammar with CCG combinatory rules. We demonstrate that our extension approaches help to significantly increase the scope of the syntactic constraints applied in our CCG-augmented HPB model and achieve significant improvements over the HPB SMT baseline.
展开▼
机译:使用句法信息增强统计机器翻译(SMT)系统旨在提高翻译质量。通过对语言语法的一个方面(即短语的层次结构)进行建模,基于短语的短语(HPB)SMT迈出了将语法纳入基于短语的(PB)SMT中的一步。语法增强机器翻译(SAMT)进一步将使用上下文无关短语结构语法(CF-PSG)提取的语法信息合并到HPB SMT模型中。基于CF-PSG的SMT系统增强方法面临的主要挑战之一是CF-PSG的成分定义与SMT系统中的“短语”不同,这阻碍了CF-PSG表达CMT的能力。许多SMT短语的句法功能。尽管SAMT使用“类CCG”运算符组合组成标签来解决此问题的方法改善了句法约束范围,但它显着提高了它们的稀疏性,这限制了翻译并对其质量产生了负面影响。 ud ud在本文中,我们解决了这些问题使用组合主语语法(CCG)的HPB SMT的基于CF-PSG的语法增强方法所面临的稀疏性和有限的语法约束。我们证明, udCCG的灵活结构和丰富的句法描述符有助于提取比CF-PSG更好的,更富有表现力和更少稀疏的句法约束,其覆盖范围比CF-PSG好。 ud使我们的CCG增强型HPB系统胜过SAMT系统。我们还尝试通过提取较少的基于CCG的细粒度标签来减轻CCG类非终结标签所施加的语法约束。我们证明,简化CCG标签有助于显着提高CCG类HPB系统的性能。最后,我们确定了在我们的CCG增强型HPB模型中限制句法约束范围的因素。然后,我们尝试通过将非末端标签的定义扩展为由一系列CCG类别组成,并通过CCG组合规则扩展胶合语法来解决这些因素。我们证明了我们的扩展方法有助于显着增加在CCG增强HPB模型中应用的句法约束的范围,并在HPB SMT基线之上取得显着改进。
展开▼