A pervasive problem facing many biomedicaltext mining applications is that ofcorrectly associating mentions of entitiesin the literature with corresponding conceptsin a database or ontology. Attemptsto build systems for automating this processhave shown promise as demonstratedby the recent BioCreAtIvE Task 1Bevaluation. A significant obstacle to improvedperformance for this task, however,is a lack of high quality trainingdata. In this work, we explore methods forimproving the quality of (noisy) Task 1Btraining data using variants of weakly supervisedlearning methods. We presentpositive results demonstrating that thesemethods result in an improvement intraining data quality as measured by improvedsystem performance over the samesystem using the originally labeled data.
展开▼