In previous studies, top-down and bottom-up approaches have been proposed for creating taxonomies of errors in chat-oriented dialogue systems. However, the reported k (kappa) value for the taxonomy based on the top-down approach is low at 0.239, and no evaluation has been conducted for that based on the bottom-up approach. In this paper, we propose to revise these taxonomies to achieve better inter-annotator agreement. The revised taxonomy based on the bottom-up approach yielded a reasonable k of 0.44 (Fleiss' k), suggesting that this taxonomy can be used reliably to classify errors in chat-oriented dialogue systems.
展开▼