首页> 外文期刊>Natural language engineering >TwitterNEED: A hybrid approach for named entity extraction and disambiguation for tweet
【24h】

TwitterNEED: A hybrid approach for named entity extraction and disambiguation for tweet

机译:TwitterNEED:用于命名实体提取和消歧的混合方法

获取原文
获取原文并翻译 | 示例
           

摘要

Twitter is a rich source of continuously and instantly updated information. Shortness and informality of tweets are challenges for Natural Language Processing tasks. In this paper, we present TwitterNEED, a hybrid approach for Named Entity Extraction and Named Entity Disambiguation for tweets. We believe that disambiguation can help to improve the extraction process. This mimics the way humans understand language and reduces error propagation in the whole system. Our extraction approach aims for high extraction recall first, after which a Support Vector Machine attempts to filter out false positives among the extracted candidates using features derived from the disambiguation phase in addition to other word shape and Knowledge Base features. For Named Entity Disambiguation, we obtain a list of entity candidates from the YAGO Knowledge Base in addition to top-ranked pages from the Google search engine for each extracted mention. We use a Support Vector Machine to rank the candidate pages according to a set of URL and context similarity features. For evaluation, five data sets are used to evaluate the extraction approach, and three of them to evaluate both the disambiguation approach and the combined extraction and disambiguation approach. Experiments show better results compared to our competitors DBpedia Spotlight, Stanford Named Entity Recognition, and the AIDA disambiguation system.
机译:Twitter是持续不断地更新信息的丰富资源。推文的简短和非正式是自然语言处理任务的挑战。在本文中,我们介绍了TwitterNEED,这是用于推文的命名实体提取和命名实体消歧的混合方法。我们相信消除歧义可以帮助改善提取过程。这模仿了人类理解语言的方式,并减少了整个系统中的错误传播。我们的提取方法旨在首先实现较高的提取回忆率,然后使用支持向量机除其他词形和知识库功能之外,还使用从歧义消除阶段得出的功能来过滤提取出的候选项中的误报。对于命名实体歧义消除,我们从YAGO知识库中获取实体候选列表,此外,对于每个提取的提及,我们还会从Google搜索引擎中获取排名最高的页面。我们使用支持向量机根据一组URL和上下文相似性功能对候选页面进行排名。为了进行评估,使用了五个数据集来评估提取方法,其中三个数据集来评估消歧方法以及组合的提取和消歧方法。与我们的竞争对手DBpedia Spotlight,斯坦福命名实体识别和AIDA消歧系统相比,实验显示出更好的结果。

著录项

  • 来源
    《Natural language engineering》 |2016年第3期|423-456|共34页
  • 作者单位

    Database, University of Twente, Enschede, the Netherlands;

    Database, University of Twente, Enschede, the Netherlands;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号