A computer implemented method and system for automatically creating an annotated dataset. An automatic annotating system may access a proprietary database and an unannotated dataset and identify tokens, or character spans, of the unannotated dataset that match property values in the database. The automatic annotating system may then determine whether the identified tokens in the unannotated dataset originated, or derived, from the database by calculating probabilities using a language model and a Bayesian network. The automatic annotating system annotates identified tokens determined to originate from the database by associating a tag to each identified token and assigning annotation attributes for each tag. The annotations and associated properties and values are stored as an annotated dataset. The annotated dataset may then be used train automated, machine learned models to identify and tag other datasets.
展开▼