首页> 外文会议>Asian Conference on Intelligent Information and Database Systems >A Hybrid Method for Named Entity Recognition on Tweet Streams
【24h】

A Hybrid Method for Named Entity Recognition on Tweet Streams

机译:一种用于Tweet流的命名实体识别的混合方法

获取原文

摘要

Information extraction from microblogs has recently attracted researchers in the fields of knowledge discovery and data mining owing to its short nature. Annotating data is one of the significant issues in applying machine learning approaches to these sources. Active learning (AL) and semi-supervised learning (SSL) are two distinct approaches to reduce annotation costs. The SSL approach exploits high-confidence samples and AL queries the most informative samples. Thus they can produce better results when jointly applied. This paper proposes a combination of AL and SSL to reduce the labeling effort for named entity recognition (NER) from tweet streams by using both machine-labeled and manually-labeled data. The AL query algorithms select the most informative samples to label those done by a human annotator. In addition, Conditional Random Field (CRF) is chosen as an underlying model to select high-confidence samples. The experiment results on a tweet dataset demonstrate that the proposed method achieves promising results in reducing the human labeling effort and that it can significantly improve the performance of NER systems.
机译:从微博的信息提取最近吸引了知识发现和数据挖掘领域的研究人员,因为它的简洁性。注释数据是将机器学习方法应用于这些来源的重要问题之一。主动学习(AL)和半监督学习(SSL)是减少注释成本的两个不同方法。 SSL方法利用高信任样本和al查询最具信息性的样本。因此,它们可以在联合应用时产生更好的结果。本文提出了通过使用机器标记和手动标记的数据来减少来自推文流的命名实体识别(ner)的标签工作的组合。 AL查询算法选择最具信息性的样本,以标记由人类注释器完成的样本。此外,选择条件随机字段(CRF)作为底层模型,以选择高置信度样本。 Tweet DataSet上的实验结果表明,所提出的方法实现了有希望降低人类标签努力,并且它可以显着提高NER系统的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号