...
首页> 外文期刊>Computational linguistics >Unsupervised Event Coreference Resolution
【24h】

Unsupervised Event Coreference Resolution

机译:无监督事件共指解决

获取原文
           

摘要

The task of event coreference resolution plays a critical role in many natural language processing applications such as information extraction, question answering, and topic detection and tracking. In this article, we describe a new class of unsupervised, nonparametric Bayesian models with the purpose of probabilistically inferring coreference clusters of event mentions from a collection of unlabeled documents. In order to infer these clusters, we automatically extract various lexical, syntactic, and semantic features for each event mention from the document collection. Extracting a rich set of features for each event mention allows us to cast event coreference resolution as the task of grouping together the mentions that share the same features (they have the same participating entities, share the same location, happen at the same time, etc.).Some of the most important challenges posed by the resolution of event coreference in an unsupervised way stem from (a) the choice of representing event mentions through a rich set of features and (b) the ability of modeling events described both within the same document and across multiple documents. Our first unsupervised model that addresses these challenges is a generalization of the hierarchical Dirichlet process. This new extension presents the hierarchical Dirichlet process's ability to capture the uncertainty regarding the number of clustering components and, additionally, takes into account any finite number of features associated with each event mention. Furthermore, to overcome some of the limitations of this extension, we devised a new hybrid model, which combines an infinite latent class model with a discrete time series model. The main advantage of this hybrid model stands in its capability to automatically infer the number of features associated with each event mention from data and, at the same time, to perform an automatic selection of the most informative features for the task of event coreference. The evaluation performed for solving both within- and cross-document event coreference shows significant improvements of these models when compared against two baselines for this task.
机译:事件共指解析的任务在许多自然语言处理应用程序中扮演着至关重要的角色,例如信息提取,问题回答以及主题检测和跟踪。在本文中,我们描述了一种新的无监督,非参数贝叶斯模型,其目的是从一组未标记的文档中概率性地推断事件提及的共指簇。为了推断这些聚类,我们自动从文档集中提取每个事件提及的各种词汇,句法和语义特征。为每个事件提及提取丰富的功能集,使我们能够对事件进行共参考解析,因为将具有相同特征的提及(它们具有相同的参与实体,共享相同的位置,同时发生,等等)组合在一起通过无监督方式解决事件共指所带来的一些最重要的挑战源于(a)通过丰富的功能选择代表事件提及的选择和(b)建模事件描述的能力同一文档,跨多个文档。我们第一个解决这些挑战的无监督模型是对Dirichlet分层过程的概括。此新扩展提供了分层Dirichlet过程捕获有关聚类组件数量的不确定性的能力,此外,还考虑了与每个事件提及相关的任何有限数量的功能。此外,为了克服此扩展的某些局限性,我们设计了一个新的混合模型,该模型将无限潜伏类模型与离散时间序列模型结合在一起。这种混合模型的主要优点在于它具有从数据中自动推断与每个事件提及相关的特征数量的能力,并且同时可以为事件共指任务自动选择最有用的特征。与此任务的两个基准相比,为解决文档内事件和跨文档事件共同引用而执行的评估显示出这些模型的显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号