The relatively recently proposed hierarchical phrase-based translation modelfor statistical machine translation (SMT) has achieved state-of-the-art performancein numerous recent translation evaluations. Hierarchical phrase-basedsystems comprise a pipeline of modules with complex interactions. Inthis thesis, we propose refinements to the hierarchical phrase-based modelas well as improvements and analyses in various modules for hierarchicalphrase-based systems.We took the opportunity of increasing amounts of available training datafor machine translation as well as existing frameworks for distributed computingin order to build better infrastructure for extraction, estimation andretrieval of hierarchical phrase-based grammars. We design and implementgrammar extraction as a series of Hadoop MapReduce jobs. We store the resultinggrammar using the HFile format, which offers competitive trade-offsin terms of efficiency and simplicity. We demonstrate improvements over twoalternative solutions used in machine translation.The modular nature of the SMT pipeline, while allowing individual improvements,has the disadvantage that errors committed by one module arepropagated to the next. This thesis alleviates this issue between the wordalignment module and the grammar extraction and estimation module byconsidering richer statistics from word alignment models in extraction. Weuse alignment link and alignment phrase pair posterior probabilities for grammarextraction and estimation and demonstrate translation improvements inChinese to English translation.This thesis also proposes refinements in grammar and language modellingboth in the context of domain adaptation and in the context of the interactionbetween first-pass decoding and lattice rescoring. We analyse alternativestrategies for grammar and language model cross-domain adaptation. Wealso study interactions between first-pass and second-pass language model in terms of size and n-gram order. Finally, we analyse two smoothing methodsfor large 5-gram language model rescoring.The last two chapters are devoted to the application of phrase-basedgrammars to the string regeneration task, which we consider as a means tostudy the fluency of machine translation output. We design and implement amonolingual phrase-based decoder for string regeneration and achieve state-of-the-artperformance on this task. By applying our decoder to the outputof a hierarchical phrase-based translation system, we are able to recover thesame level of translation quality as the translation system.
展开▼