This thesis presents the design and implementation of linguistically-informed models forudstatistical phrase-based machine translation. Using Koehn’s Pharaoh (2004), a state-of-the-artudSMT system, and Moses (Hoang, 2006), a variant of the former which supports factoredudtranslation models, we have investigated two approaches: Combined Feature Models andudFactored Models. While Combined Feature Models make use of concatenations of linguisticudfeatures to enrich their models, Factored Models view a token as a vector of factors, enablingudto build relatively independent models for each factor. In the context of machine translation,udboth models were expected to enrich the existing surface word model with additionaludlinguistic information.udThe research undertaken focused on finding ways to improve output translation qualityudfor English-to-French and French-to-English translations from various standpoints. A betterudgeneral readability and understandability of a generated document should be achieved mainlyudby ensuring the text fluency in the target language (syntactic correctness), its adequacy (use ofudadequate terminology) and its fidelity (semantic adequacy). These main goals were addressedudby first of all analysing the Pharaoh’s current performance, and understanding language specificudand model-related problems encountered. Several experiments were then performedudusing our two approaches, and their results were compared.udDespite a few noted improvements in some of the linguistic issues discussed, notablyudfixed expression translation and part-of-speech ambiguity, major problems involving complexudsyntactic structures in the source language still posed a hard challenge to the approach ofudlinguistically augmenting phrase-based statistical machine translation.
展开▼