Examples described herein provide a computer-implemented method that includes receiving a ground truth associated with a domain cartridge, the domain cartridge comprising a plurality of hierarchical layers. The method further includes analyzing annotation blocks in relation to data present in the ground truth to detect any errors in a set of natural language processing annotators. The analyzing includes computing a recall score, a precision score, and an F1 score for each annotation block in a lowest level layer of the plurality of hierarchical layers. The analyzing further includes determining whether an error is detected at the lowest level layer of the plurality of hierarchical layers based at least in part on the recall score, the precision score, and the F1 score. The analyzing further includes terminating the analyzing responsive to determining that the error is detected at the lowest level layer of the plurality of hierarchical layers.
展开▼