首页> 外文期刊>International journal of web information systems >Various approaches to text representation for named entity disambiguation
【24h】

Various approaches to text representation for named entity disambiguation

机译:用于命名实体消歧的各种文本表示方法

获取原文
获取原文并翻译 | 示例
       

摘要

Purpose - The purpose of this paper is to focus on the problem of named entity disambiguation. The paper disambiguates named entities on a very detailed level. To each entity is assigned a concrete identifier of a corresponding Wikipedia article describing the entity. Design/methodology/approach - For such a fine-grained disambiguation a correct representation of the context is crucial. The authors compare various context representations: bag of words representation, linguistic representation and structured co-occurrence representation. Models for each representation are described and evaluated. They also investigate the possibilities of multilingual named entity disambiguation. Findings - Based on this evaluation, the structured co-occurrence representation provides the best disambiguation results. It showed up that this method could be successfully applied also on other languages, not only on English. Research limitations/implications - Despite its good results the structured co-occurrence context representation has several limitations. It trades precision for recall, which might not be desirable in some use cases. Also it is not able to disambiguate two different types of entities, which are mentioned under the same name in the same text. These limitations can be overcome by combination with other described methods. Practical implications - The authors provide a ready-made web service, which can be directly plugged in existing applications using a REST interface. Originality/value - The paper proposes a new approach to named entity disambiguation exploiting various context representation models (bag of words, linguistic and structural representation). The authors constructed a comprehensive dataset based on all English Wikipedia articles for named entity disambiguation. They evaluated and compared the individual context representation models on this dataset. They evaluate the support of multiple languages.
机译:目的-本文的目的是关注命名实体歧义消除问题。本文在非常详细的级别上消除了命名实体的歧义。给每个实体分配了描述该实体的相应维基百科文章的具体标识符。设计/方法/方法-对于此类细粒度的歧义,正确表示上下文至关重要。作者比较了各种上下文表示形式:词袋表示形式,语言表示形式和结构化共现表示形式。描述并评估了每种表示形式的模型。他们还研究了多语言命名实体歧义消除的可能性。结果-基于此评估,结构化共现表示可提供最佳的歧义消除结果。结果表明,该方法还可以成功应用于其他语言,而不仅限于英语。研究局限性/含义-尽管取得了不错的成绩,但结构化共现上下文表示仍存在一些局限性。它以精确度为代价进行取回,这在某些用例中可能不是理想的。同样,它也无法消除两种不同类型的实体的歧义,它们在同一文本中以相同的名称提及。通过与其他描述的方法结合可以克服这些限制。实际意义-作者提供了现成的Web服务,可以使用REST接口将其直接插入现有应用程序中。原创性/价值-本文提出了一种利用各种上下文表示模型(单词袋,语言和结构表示)消除命名实体歧义的新方法。作者基于所有英文Wikipedia文章构建了一个全面的数据集,以消除命名实体的歧义。他们评估并比较了该数据集上的各个上下文表示模型。他们评估了多种语言的支持。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号