首页> 外文期刊>International Journal of Applied Engineering Research >A Study On Named Entity Recognition For Malayalam Language Using TnT Tagger & Maximum Entropy Markov Model
【24h】

A Study On Named Entity Recognition For Malayalam Language Using TnT Tagger & Maximum Entropy Markov Model

机译:基于TnT标注和最大熵马尔可夫模型的马拉雅拉姆语语言命名实体识别研究

获取原文
获取原文并翻译 | 示例
           

摘要

Information Extraction is the process of extracting the relevant data from the given text documents. It is one of the widely used research areas in Natural language processing. Named Entity Recognition (NER) deals with recognizing named entities (NE) such as person, organization, location in the text documents. In the existing system, TnT tagger is used for named entity recognition. The drawback is that there occurs a problem while handling the unknown words. Maximum entropy markov models have a drawback. It potentially suffer from the label bias problem. The low-entropy transition distributions effectively ignore their observations. This work implements a named entity recognizer for Malayalam language using Maximum Entropy Markov Model (MEMM). It combines the features of Hidden Markov models and Maximum entropy. It represents all these observations as arbitrary features such as capitalization, word, PoS, formatting). Input data for proposed Named Entity Recognition system is any text document related to the any domain in Malayalam language. Trigrams'n'tags (TnT) tagger is used for parts of speech (POS) tagging. The significance of the work is that it helps in smoothing and handling unknown words. The system is experimented with more than thousand sentences. An accuracy of 82.5% is obtained for the proposed methodology.
机译:信息提取是从给定文本文档中提取相关数据的过程。它是自然语言处理中广泛使用的研究领域之一。命名实体识别(NER)处理识别命名实体(NE),例如文本文档中的人员,组织,位置。在现有系统中,TnT标记器用于命名实体识别。缺点是在处理未知单词时会出现问题。最大熵马尔可夫模型有一个缺点。它可能会遭受标签偏差问题。低熵跃迁分布实际上忽略了它们的观察结果。这项工作使用最大熵马尔可夫模型(MEMM)实现了马拉雅拉姆语语言的命名实体识别器。它结合了隐马尔可夫模型和最大熵的特征。它将所有这些观察结果表示为任意特征,例如大写,单词,PoS,格式。提议的命名实体识别系统的输入数据是与马拉雅拉姆语语言中的任何域相关的任何文本文档。 Trigrams标记(TnT)标记器用于词性(POS)标记。这项工作的意义在于,它有助于平滑和处理未知单词。该系统使用了数千个句子进行了实验。所提出的方法的准确度为82.5%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号