A computing method receives a labeled sample from an annotator. The method may determine a plurality of reference model risk scores for the first labeled sample, where each reference model risk score corresponds to an amount of risk associated with adding the first labeled sample to a respective reference model of a plurality of reference models. The method may determine an overall risk score for the first labeled sample based on the plurality of reference model risk scores. The method may further determine a probe for confirmation of the first labeled sample and a trust score for the annotator by sending the probe to one or more annotators. In response to determining a trust score for the annotator the method may add the labeled sample to a ground truth or reject the labeled sample.
展开▼