On active annotation for named entity recognition

Ekbal Asif; Saha Sriparna; Sikdar Utpal Kumar

首页> 外文期刊>International journal of machine learning and cybernetics >On active annotation for named entity recognition

【24h】

On active annotation for named entity recognition

机译：在活动注解中进行命名实体识别

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A major constraint of machine learning techniques for solving several information extraction problems is the availability of sufficient amount of training examples, which involve huge costs and efforts to prepare. Active learning techniques select informative instances from the unlabeled data and add it to the training set in such a way that the overall classification performance improves. In random sampling approach, unlabeled data is selected for annotation at random and thus can't yield the desired results. In contrast, active learning selects the useful data from a huge pool of unlabeled documents. The strategies used often classify the instances to belong to the incorrect classes. The classifier is confused between two classes if the test instance is located near the margin. We propose two methods for active learning, and show that these techniques favorably result in the increased performance. The first approach is based on support vector machine (SVM), whereas the second one is based on an ensemble learning which utilizes the classification capabilities of two well-known classifiers, namely SVM and conditional random field. The motivation of using these classifiers is that these are orthogonal in nature, and thereby a combination of them can produce the better results. In order to show the efficacy of the proposed approach we choose a crucial problem, namely named entity recognition (NER) in three languages, namely Bengali, Hindi and English. This is also evaluated for NER in biomedical domain. Evaluation results reveal that the proposed techniques indeed show considerable performance improvements.

机译：解决若干信息提取问题的机器学习技术的一个主要限制是是否有足够数量的训练示例，这涉及大量成本和准备工作。主动学习技术从未标记的数据中选择信息丰富的实例，并将其添加到训练集中，从而提高整体分类性能。在随机抽样方法中，未标记的数据被随机选择用于注释，因此无法产生预期的结果。相反，主动学习从大量未标记文档中选择有用的数据。经常使用的策略将实例分类为不正确的类。如果测试实例位于边距附近，则分类器会在两个类之间混淆。我们提出了两种主动学习的方法，并表明这些技术有利地提高了性能。第一种方法基于支持向量机（SVM），而第二种方法则基于整体学习，该学习利用了两个众所周知的分类器SVM和条件随机场的分类能力。使用这些分类器的动机是它们本质上是正交的，因此将它们组合可以产生更好的结果。为了显示该方法的有效性，我们选择了一个关键问题，即孟加拉语，北印度语和英语三种语言的命名实体识别（NER）。还对生物医学领域的NER进行了评估。评估结果表明，所提出的技术确实显示出相当大的性能改进。

著录项

来源
《International journal of machine learning and cybernetics》 |2016年第4期|623-640|共18页
作者
Ekbal Asif; Saha Sriparna; Sikdar Utpal Kumar;
展开▼
作者单位

Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna 800013, Bihar, India;

Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna 800013, Bihar, India;

Indian Inst Technol Patna, Dept Comp Sci & Engn, Patna 800013, Bihar, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Named entity recognition (NER); Active learning; Conditional random field (CRF); Support vector machine; Classifier ensemble; Biomedical domain;

机译：命名实体识别（NER）;主动学习;条件随机场（CRF）;支持向量机;分类器集合;生物医学领域;

相似文献

外文文献
中文文献
专利

1. PARTIAL ANNOTATION SCHEME FOR ACTIVE LEARNING ON NAMED ENTITY RECOGNITION TASKS [J] . KOGA KOBAYASHI, KEI WAKABAYASHI Journal of Data Intelligence . 2020,第3期

机译：用于任命实体识别任务的活动学习的部分注释方案
2. An active learning-enabled annotation system for clinical named entity recognition [J] . Yukun Chen, Thomas A. Lask, Qiaozhu Mei, BMC Medical Informatics and Decision Making . 2017,第2期

机译：用于临床命名实体识别的可主动学习的注释系统
3. Evaluating the impact of pre-annotation on annotation speed and potential bias: natural language processing gold standard development for clinical named entity recognition in clinical trial announcements. [J] . Todd Lingren, Louise Deleger, Katalin Molnar, Journal of the American Medical Informatics Association : . 2014,第3期

机译：评估预批注对批注速度和潜在偏见的影响：在临床试验公告中为临床命名实体识别开发自然语言处理黄金标准。
4. Ensemble based active annotation for biomedical named entity recognition [C] . Verma Mridula, Sikdar Utpal, Saha Sriparna, International Conference on Advances in Computing, Communications and Informatics . 2013

机译：基于集合的主动注释，用于生物医学命名实体识别
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. An active learning-enabled annotation system for clinical named entity recognition [O] . Yukun Chen, Thomas A. Lask, Qiaozhu Mei, 2017

机译：用于临床命名实体识别的可主动学习的注释系统
7. An active learning-enabled annotation system for clinical named entity recognition [O] . Chen, Yukun, Lask, Thomas A, Mei, Qiaozhu, 2017

机译：用于临床命名实体识别的主动学习启用注释系统

On active annotation for named entity recognition

摘要

著录项

相似文献

相关主题

期刊订阅