The concept of example credibility evaluates how much a classifier can trust an example when building a classification model. It is given by a credibility function, which is application dependent and estimated according to a series of factors that influence the credibility of the examples. Here we deal with automatic document classification and study the credibility of a document according to three factors: content, authorship and citations. We propose a genetic programming algorithm to estimate the credibility of training examples, and then add this estimation to a credibility-aware classifier. For that, we model the authorship and citation data as a complex network, and select a set of structural metrics that can be used to estimate credibility. These metrics are then merged with other content-related ones, and used as terminals for the GP. The GP was tested in a subset of the ACM-DL, and results showed that the credibility-aware classifier obtained results of micro and macroF1 from 5% to 8% better than the traditional classifiers.
展开▼