A Study On Named Entity Recognition For Malayalam Language Using TnT Tagger & Maximum Entropy Markov Model

Shruthi S.; Jiljo; Pranav P. V.

首页> 外文期刊>International Journal of Applied Engineering Research >A Study On Named Entity Recognition For Malayalam Language Using TnT Tagger & Maximum Entropy Markov Model

【24h】

A Study On Named Entity Recognition For Malayalam Language Using TnT Tagger & Maximum Entropy Markov Model

机译：基于TnT标注和最大熵马尔可夫模型的马拉雅拉姆语语言命名实体识别研究

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Information Extraction is the process of extracting the relevant data from the given text documents. It is one of the widely used research areas in Natural language processing. Named Entity Recognition (NER) deals with recognizing named entities (NE) such as person, organization, location in the text documents. In the existing system, TnT tagger is used for named entity recognition. The drawback is that there occurs a problem while handling the unknown words. Maximum entropy markov models have a drawback. It potentially suffer from the label bias problem. The low-entropy transition distributions effectively ignore their observations. This work implements a named entity recognizer for Malayalam language using Maximum Entropy Markov Model (MEMM). It combines the features of Hidden Markov models and Maximum entropy. It represents all these observations as arbitrary features such as capitalization, word, PoS, formatting). Input data for proposed Named Entity Recognition system is any text document related to the any domain in Malayalam language. Trigrams'n'tags (TnT) tagger is used for parts of speech (POS) tagging. The significance of the work is that it helps in smoothing and handling unknown words. The system is experimented with more than thousand sentences. An accuracy of 82.5% is obtained for the proposed methodology.

机译：信息提取是从给定文本文档中提取相关数据的过程。它是自然语言处理中广泛使用的研究领域之一。命名实体识别（NER）处理识别命名实体（NE），例如文本文档中的人员，组织，位置。在现有系统中，TnT标记器用于命名实体识别。缺点是在处理未知单词时会出现问题。最大熵马尔可夫模型有一个缺点。它可能会遭受标签偏差问题。低熵跃迁分布实际上忽略了它们的观察结果。这项工作使用最大熵马尔可夫模型（MEMM）实现了马拉雅拉姆语语言的命名实体识别器。它结合了隐马尔可夫模型和最大熵的特征。它将所有这些观察结果表示为任意特征，例如大写，单词，PoS，格式。提议的命名实体识别系统的输入数据是与马拉雅拉姆语语言中的任何域相关的任何文本文档。 Trigrams标记（TnT）标记器用于词性（POS）标记。这项工作的意义在于，它有助于平滑和处理未知单词。该系统使用了数千个句子进行了实验。所提出的方法的准确度为82.5％。

著录项

来源
《International Journal of Applied Engineering Research》 |2016年第1期|共5页
作者
Shruthi S.; Jiljo; Pranav P. V.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类工程基础科学;
关键词
Malayalam training corpus; NE; NER; TnT; IE; MEMM;

机译：马拉雅拉姆语训练语料库;Ne;Trent;I;Mem;

相似文献

外文文献
中文文献
专利

1. A Study On Named Entity Recognition For Malayalam Language Using TnT Tagger & Maximum Entropy Markov Model [J] . Shruthi S., Jiljo, Pranav P. V. International Journal of Applied Engineering Research . 2016,第8aPta1期

机译：基于TnT标注和最大熵马尔可夫模型的马拉雅拉姆语语言命名实体识别研究
2. Named Entity Recognition for Telugu Using Maximum Entropy Model [J] . G.V.S.raju, B.srinivasu, Dr.S.viswanadha Raju, Journal of Theoretical and Applied Information Technology . 2010,第2期

机译：最大熵模型的泰卢固语命名实体识别
3. TaggerOne: joint named entity recognition and normalization with semi-Markov Models [J] . Leaman Robert, Lu Zhiyong Bioinformatics . 2016,第18期

机译：TaggerOne：使用半马尔可夫模型进行的联合命名实体识别和规范化
4. Maximum Entropy Named Entity Recognition for Czech Language [C] . Michal Konkol, Miloslav Konopik Text, speech and dialogue . 2011

机译：捷克语的最大熵命名实体识别
5. A maximum entropy approach to named entity recognition. [D] . Borthwick, Andrew Eliot. 1999

机译：命名实体识别的最大熵方法。
6. TaggerOne: joint named entity recognition and normalization with semi-Markov Models [O] . Robert Leaman, Zhiyong Lu -1

机译：TaggerOne：使用半马尔可夫模型进行的联合命名实体识别和规范化
7. A COMPARATIVE STUDY OF WORD REPRESENTATION METHODS WITH CONDITIONAL RANDOM FIELDS AND MAXIMUM ENTROPY MARKOV FOR BIO-NAMED ENTITY RECOGNITION [O] . Maan Tareq Abd, Masnizah Mohd 2018

机译：有条件随机字段和最大熵的词语表示方法对生物命名实体识别的最大熵

A Study On Named Entity Recognition For Malayalam Language Using TnT Tagger & Maximum Entropy Markov Model

摘要

著录项

相似文献

相关主题

期刊订阅