Statistical post-editing (SPE) has been successfully applied to RBMT systems and, to a less successful extent, to some SMT systems. This thesis investigates the impact of SPE on SMT systems. We apply SPE to an SMT system using a new context-modelling approach to preserve some aspects of source information in the second stage translation. This technique yields mixed results, but fails to consistently improve the output over the baseline. Furthermore, we compared the results to those of an RBMT+SPE system and a pure SMT system, using both automatic and human evaluation methods. Results show that while automatic evaluation metrics favour a pure SMT system, manual evaluators prefer the output provided by the combined RBMT+SPE system. We investigate the use machine learning methods to predict which sentences would benefit from post-editing, however, as the oracle score for both SMT and SMT+SPE was not much higher than the two systems alone, we decided to compare two systems that had a higher upper bound. Combining our analysis with machine learning techniques for quality estimation, we are able to improve the overall output by automatically selecting the best sentences from each of the SMT and RBMT+SPE systems.
展开▼