The challenge of translation varies from one sentence to another, or even between phrases of a sentence. We investigate whether variations in difficulty can be located automatically for Statistical Machine Translation (SMT). Furthermore, we hypothesize that customization of a SMT system based on difficulty information, improves the translation quality.We assume a binary categorization for phrases: easy vs. difficult. Our focus is on the Difficult to Translate Phrases (DTPs). Our experiments show that for a sentence, improving the translation of the DTP improves the translation of the surrounding non-difficult phrases too. To locate the most difficult phrase of each sentence, we use machine learning and construct a difficulty classifier. To improve the translation of DTPs, we introduce customization methods for three components of the SMT system: I. language model; II. translation model; III. decoding weights. With each method, we construct a new component that is dedicated for the translation of difficult phrases. Our experiments on Arabic-to-English translation show that DTP-specific system customization is mostly successful.Overall, we demonstrate that translation difficulty is an important source of information for machine translation and can be used to enhance its performance.
展开▼