Named entity recognition is a challenging task that has traditionallyrequired large amounts of knowledge in the form of feature engineering andlexicons to achieve high performance. In this paper, we present a novel neuralnetwork architecture that automatically detects word- and character-levelfeatures using a hybrid bidirectional LSTM and CNN architecture, eliminatingthe need for most feature engineering. We also propose a novel method ofencoding partial lexicon matches in neural networks and compare it to existingapproaches. Extensive evaluation shows that, given only tokenized text andpublicly available word embeddings, our system is competitive on the CoNLL-2003dataset and surpasses the previously reported state of the art performance onthe OntoNotes 5.0 dataset by 2.13 F1 points. By using two lexicons constructedfrom publicly-available sources, we establish new state of the art performancewith an F1 score of 91.62 on CoNLL-2003 and 86.28 on OntoNotes, surpassingsystems that employ heavy feature engineering, proprietary lexicons, and richentity linking information.
展开▼