Information Extraction is the process of extracting the relevant data from the given text documents. It is one of the widely used research areas in Natural language processing. Named Entity Recognition (NER) deals with recognizing named entities (NE) such as person, organization, location in the text documents. In the existing system, TnT tagger is used for named entity recognition. The drawback is that there occurs a problem while handling the unknown words. Maximum entropy markov models have a drawback. It potentially suffer from the label bias problem. The low-entropy transition distributions effectively ignore their observations. This work implements a named entity recognizer for Malayalam language using Maximum Entropy Markov Model (MEMM). It combines the features of Hidden Markov models and Maximum entropy. It represents all these observations as arbitrary features such as capitalization, word, PoS, formatting). Input data for proposed Named Entity Recognition system is any text document related to the any domain in Malayalam language. Trigrams'n'tags (TnT) tagger is used for parts of speech (POS) tagging. The significance of the work is that it helps in smoothing and handling unknown words. The system is experimented with more than thousand sentences. An accuracy of 82.5% is obtained for the proposed methodology.
展开▼