Named Entity Recognition (NER) is an important task in many NLPpipelines. It has become especially important for knowledge bases that power manyof the nowadays information retrieval systems. In order to cope with the highdemand for annotated training corpora for supervised NER systems, automaticgeneration approaches have been proposed. In this paper we report on the firstautomatically generated NE annotated corpus for Albanian. News articles fromAlbanian news media were used as a document source. They were automaticallytagged using a custom generated gazetteer from the Albanian Wikipedia. Ourevaluation results show that this corpus can be used as a baseline corpus for humanannotated ones or as a training corpus where no other is available.
展开▼