This paper presents an empirical study on the influence of singletons on the evaluation of coreference resolution systems. We present results on two English data sets used in the SemEval 2010 shared task 1 and the CoNLL 2011 shared task using the scorers of both shared tasks. We show that singletons, both in the gold standard and in the system output, have an immense impact on the overall evaluation - in an experiment where the coreference resolution results remain unchanged over the different settings.
展开▼