This paper examines two techniques of manual evaluation that can be used to identify errortypes of individual machine translation systems. The first technique of “blind post-editing” isbeing used in WMT evaluation campaigns since 2009 and manually constructed data of thistype are available for various language pairs. The second technique of explicit marking of errorshas been used in the past as well.We propose a method for interpreting blind post-editing data at a finer level and comparethe results with explicit marking of errors. While the human annotation of either of the techniquesis not exactly reproducible (relatively low agreement), both techniques lead to similarobservations of differences of the systems. Specifically, we are able to suggest which errors inMT output are easy and hard to correct with no access to the source, a situation experienced byusers who do not understand the source language.
展开▼