首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >NELasso: Group-Sparse Modeling for Characterizing Relations Among Named Entities in News Articles
【24h】

NELasso: Group-Sparse Modeling for Characterizing Relations Among Named Entities in News Articles

机译:NELasso:用于描述新闻文章中命名实体之间关系的群体稀疏建模

获取原文
获取原文并翻译 | 示例
       

摘要

Named entities such as people, locations, and organizations play a vital role in characterizing online content. They often reflect information of interest and are frequently used in search queries. Although named entities can be detected reliably from textual content, extracting relations among them is more challenging, yet useful in various applications (e.g., news recommending systems). In this paper, we present a novel model and system for learning semantic relations among named entities from collections of news articles. We model each named entity occurrence with sparse structured logistic regression, and consider the words (predictors) to be grouped based on background semantics. This sparse group LASSO approach forces the weights of word groups that do not influence the prediction towards zero. The resulting sparse structure is utilized for defining the type and strength of relations. Our unsupervised system yields a named entities' network where each relation is typed, quantified, and characterized in context. These relations are the key to understanding news material over time and customizing newsfeeds for readers. Extensive evaluation of our system on articles from TIME magazine and BBC News shows that the learned relations correlate with static semantic relatedness measures like WLM, and capture the evolving relationships among named entities over time.
机译:诸如人物,位置和组织之类的具名实体在表征在线内容中起着至关重要的作用。它们通常反映出感兴趣的信息,并经常用于搜索查询中。尽管可以从文本内容中可靠地检测到命名实体,但是提取它们之间的关系更具挑战性,但是在各种应用程序(例如新闻推荐系统)中很有用。在本文中,我们提出了一种新颖的模型和系统,用于从新闻文章集中学习命名实体之间的语义关系。我们使用稀疏结构化逻辑回归对每个命名实体出现进行建模,并考虑根据背景语义对单词(预测变量)进行分组。这种稀疏的组LASSO方法将不影响预测的词组的权重强制为零。所得的稀疏结构用于定义关系的类型和强度。我们的无人监督系统产生了一个命名实体的网络,在该网络中,每个关联都在上下文中被键入,量化和表征。这些关系是了解一段时间内新闻材料和为读者自定义新闻源的关键。对《时代》杂志和《英国广播公司新闻》的文章对我们的系统进行的广泛评估表明,学习到的关系与静态语义相关性度量(例如WLM)相关联,并随着时间的推移捕获了命名实体之间不断发展的关系。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号