首页> 外文会议>International conference on theory and practice of digital libraries >Person-Centric Mining of Historical Newspaper Collections
【24h】

Person-Centric Mining of Historical Newspaper Collections

机译:以人为中心的历史报纸收藏

获取原文

摘要

We present a text mining environment that supports entity-centric mining of terascale historical newspaper collections. Information about entities and their relation to each other is often crucial for historical research. However, most text mining tools provide only very basic support for dealing with entities, typically at most including facilities for entity tagging. Historians, on the other hand, are typically interested in the relations between entities and the contexts in which these are mentioned. In this paper, we focus on person entities. We provide an overview of the tool and describe how person-centric mining can be integrated in a general-purpose text mining environment. We also discuss our approach for automatically extracting person networks from newspaper archives, which includes a novel method for person name disambiguation, which is particularly suited for the newspaper domain and obtains state-of-the-art disambiguation results.
机译:我们提供了一个文本挖掘环境,该环境支持以实体为中心的万亿级历史报纸收藏的挖掘。有关实体及其相互关系的信息对于历史研究通常至关重要。但是,大多数文本挖掘工具仅提供非常基本的支持来处理实体,通常最多包括用于实体标记的工具。另一方面,历史学家通常对实体之间的关系以及提及这些实体的上下文感兴趣。在本文中,我们专注于人的实体。我们提供了该工具的概述,并描述了如何将以人为中心的挖掘集成到通用文本挖掘环境中。我们还将讨论从报纸档案库中自动提取人际网络的方法,其中包括一种新颖的人名歧义消除方法,该方法特别适用于报纸领域并获得最新的歧义消除结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号