首页> 外文期刊>Information Processing & Management >Whose story is it anyway? Automatic extraction of accounts from news articles
【24h】

Whose story is it anyway? Automatic extraction of accounts from news articles

机译:无论如何谁的故事?自动提取新闻文章的账户

获取原文
获取原文并翻译 | 示例
       

摘要

Narratives are comprised of stories that provide insight into social processes. To facilitate the analysis of narratives in a more efficient manner, natural language processing (NLP) methods have been employed in order to automatically extract information from textual sources, e.g., newspaper articles. Existing work on automatic narrative extraction, however, has ignored the nested character of narratives. In this work, we argue that a narrative may contain multiple accounts given by different actors. Each individual account provides insight into the beliefs and desires underpinning an actor's actions. We present a pipeline for automatically extracting accounts, consisting of NLP methods for: (1) named entity recognition, (2) event extraction, and (3) attribution extraction. Machine learning-based models for named entity recognition were trained based on a state-of-the-art neural network architecture for sequence labelling. For event extraction, we developed a hybrid approach combining the use of semantic role labelling tools, the FrameNet repository of semantic frames, and a lexicon of event nouns. Meanwhile, attribution extraction was addressed with the aid of a dependency parser and Levin's verb classes. To facilitate the development and evaluation of these methods, we constructed a new corpus of news articles, in which named entities, events and attributions have been manually marked up following a novel annotation scheme that covers over 20 event types relating to socio-economic phenomena. Evaluation results show that relative to a baseline method underpinned solely by semantic role labelling tools, our event extraction approach optimises recall by 12.22-14.20 percentage points (reaching as high as 92.60% on one data set). Meanwhile, the use of Levin's verb classes in attribution extraction obtains optimal performance in terms of F-score, outperforming a baseline method by 7.64-11.96 percentage points. Our proposed approach was applied on news articles focused on industrial regeneration cases. This facilitated the generation of accounts of events that are attributed to specific actors.
机译:叙述由洞察社会流程提供洞察的故事组成。为了便于以更有效的方式分析叙述,已经采用了自然语言处理(NLP)方法,以便自动从文本来源提取信息,例如报纸文章。然而,现有的自动叙述提取工作忽略了叙述的嵌套特征。在这项工作中,我们认为叙述可能包含不同的演员给出的多个账户。每个账户都提供对信仰和欲望的洞察力,支持演员的行为。我们提供了一个用于自动提取的管道,由NLP方法组成:(1)命名实体识别,(2)事件提取,(3)归属提取。基于机器基于基于实体识别的模型,基于用于序列标记的最先进的神经网络架构进行培训。对于事件提取,我们开发了一种混合方法,这些方法组合使用语义角色标记工具,语义框架的FRAMENET存储库以及事件名词的词典。同时,借助依赖解析器和莱文的动词类来解决归因提取。为促进这些方法的开发和评估,我们建立了新的新闻文章,其中,其中指定实体,事件和归属在一个与社会经济现象有关的20多种事件类型涵盖了20多种事件类型之后。评估结果表明,相对于基线方法仅通过语义角色标记工具支撑,我们的事件提取方法优化召回12.22-14.20百分点(在一个数据集上达到92.60%)。同时,在归因提取中使用Levin的动词类在F评分方面获得最佳性能,优于7.64-11.96个百分点的基线方法。我们拟议的方法适用于专注于工业再生案件的新闻文章。这促进了归因于特定行为者的事件的账户。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号