Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for receiving a plurality of word strings in a first language, each received word string comprising a plurality of words, identifying one or more named entities in each received word string using a statistical classifier that was trained using training data comprising a plurality of features, wherein one of the features is a word shape feature that comprises a respective token for each letter of a respective word wherein each token signifies a case of the letter or whether the letter is a digit, and translating the received word strings from the first language to a second language including preserving the respective identified named entities in each received word string during translation.
展开▼