Part-of-speech tagging in real-world applications is performed ontext in domains which are different from the publicly available largetraining data sets. The two most successful part-of-speech taggers aretrained on the Wall Street Journal corpus, a corpus of millions ofwords. We compare their performance on a test set from a differentdomain-astronomy-from documents that are available on the World WideWeb. The Maximum Entropy Part of Speech Tagger (MXPOST) and theTransformation-Based Learning Tagger are well-known and widely used inlanguage research and development systems. The two taggers were testedin several modes: (1) after training on the Wall Street Journal corpusonly, (2) after training on only a small body of text from our astronomydomain, (3) with and without an auxiliary lexicon derived from manyastronomy-related Web documents, and (4) after incremental training-thatis, having been trained on the Wall Street Journal, with additionaltraining from the specific domain. One conclusion from the experiment isthat different taggers exhibit different biases when trained on the samedata
展开▼