...
首页> 外文期刊>International journal of machine learning and cybernetics >On active annotation for named entity recognition
【24h】

On active annotation for named entity recognition

机译:在活动注解中进行命名实体识别

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

A major constraint of machine learning techniques for solving several information extraction problems is the availability of sufficient amount of training examples, which involve huge costs and efforts to prepare. Active learning techniques select informative instances from the unlabeled data and add it to the training set in such a way that the overall classification performance improves. In random sampling approach, unlabeled data is selected for annotation at random and thus can't yield the desired results. In contrast, active learning selects the useful data from a huge pool of unlabeled documents. The strategies used often classify the instances to belong to the incorrect classes. The classifier is confused between two classes if the test instance is located near the margin. We propose two methods for active learning, and show that these techniques favorably result in the increased performance. The first approach is based on support vector machine (SVM), whereas the second one is based on an ensemble learning which utilizes the classification capabilities of two well-known classifiers, namely SVM and conditional random field. The motivation of using these classifiers is that these are orthogonal in nature, and thereby a combination of them can produce the better results. In order to show the efficacy of the proposed approach we choose a crucial problem, namely named entity recognition (NER) in three languages, namely Bengali, Hindi and English. This is also evaluated for NER in biomedical domain. Evaluation results reveal that the proposed techniques indeed show considerable performance improvements.
机译:解决若干信息提取问题的机器学习技术的一个主要限制是是否有足够数量的训练示例,这涉及大量成本和准备工作。主动学习技术从未标记的数据中选择信息丰富的实例,并将其添加到训练集中,从而提高整体分类性能。在随机抽样方法中,未标记的数据被随机选择用于注释,因此无法产生预期的结果。相反,主动学习从大量未标记文档中选择有用的数据。经常使用的策略将实例分类为不正确的类。如果测试实例位于边距附近,则分类器会在两个类之间混淆。我们提出了两种主动学习的方法,并表明这些技术有利地提高了性能。第一种方法基于支持向量机(SVM),而第二种方法则基于整体学习,该学习利用了两个众所周知的分类器SVM和条件随机场的分类能力。使用这些分类器的动机是它们本质上是正交的,因此将它们组合可以产生更好的结果。为了显示该方法的有效性,我们选择了一个关键问题,即孟加拉语,北印度语和英语三种语言的命名实体识别(NER)。还对生物医学领域的NER进行了评估。评估结果表明,所提出的技术确实显示出相当大的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号