The article argues that recall and precision are imperfect as measures for robust anaphora resolution algorithms, and proposes instead a success rate for anaphora resolution algorithms and for anaphora resolution systems separately. The article also proposes a package of evaluation measures and tasks for anaphora resolution: it is believed that these newly added tasks which have been carried out on Mitkov's (1998) knowledge-poor approach, provide a better, more comprehensive picture of the performance of anaphora resolution algorithms or systems. Finally, the ongoing work on the development of a consistent evaluation environment for anaphora resolution is outlined.
展开▼