...
首页> 外文期刊>International Journal of Population Data Science >Advanced methods for linking complex historical birth, death, marriage and census data
【24h】

Advanced methods for linking complex historical birth, death, marriage and census data

机译:链接复杂的历史出生,死亡,婚姻和人口普查数据的高级方法

获取原文
   

获取外文期刊封面封底 >>

       

摘要

ABSTRACT ObjectiveRecent years have seen the development of novel techniques for linking complex types of data that contain records about different types of entities, for example bibliographic databases with records about authors, publications, and venues. Advanced approaches have been devised to link individuals and groups of records. These approaches exploit both the similarities between record attributes as well as the relationships between entities. Rather than linking records about different types of entities, in this work we study the novel problem of linking records where the same entity can have different roles and where these roles can change over time. ApproachWe specifically develop novel techniques for linking historical birth, death, marriage, and census certificates with the aim to reconstruct the population covered by these certificates over a period of time. Our techniques make use of constraints that consider roles, relationships, as well as time. Our first technique links certificates based on the specific roles of their individuals, and greedily selects pairs of certificates with the highest overall similarity while also considering 1-to-1 and 1-to-many linkage constraints. Our second hybrid technique combines graph, group, and temporal linkage, and also considers relationship information between individuals and groups. We compare these techniques with state-of-the-art group, collective, and graph-based linkage approaches. ResultsWe evaluate our proposed techniques on real Scottish data from 1861 to 1901 that cover the population of the Isle of Skye. In total, these data sets contain 119,042 certificates for 234,365 individuals. As ground truth we have a set of life-segments of records manually linked by domain experts. Our results indicate that even advanced techniques have difficulty in achieving high linkage quality compared to careful manual linkage. Two reasons for this are the very small name pool in our data and the changing nature of people's personal details over time. Both our proposed techniques, however, significantly outperform traditional pair-wise attribute similarity and group linkage approaches, with the greedy role-based technique achieving better results than the hybrid technique. ConclusionOur experiments on real data show that even with advanced linkage techniques that employ group, graph, relationship, and temporal approaches it is challenging to achieve high quality links from complex data such as birth, death, marriage and census certificates that span several decades. As future work we will improve all steps of our techniques with the goal of developing highly accurate, scalable, and automatic techniques for linking large-scale complex population databases.
机译:摘要目的近几年来,出现了用于链接复杂类型的数据的新颖技术,这些数据包含有关不同类型实体的记录,例如书目数据库与有关作者,出版物和场所的记录。已经设计出高级方法来链接个人和记录组。这些方法利用了记录属性之间的相似性以及实体之间的关系。在本文中,我们不是链接有关不同类型实体的记录,而是研究链接记录的新问题,其中相同实体可以具有不同的角色,并且这些角色可以随时间变化。方法我们专门开发了将历史出生,死亡,结婚和人口普查证明书联系起来的新技术,旨在在一段时间内重建这些证明书所涵盖的人口。我们的技术利用了考虑角色,关系和时间的约束。我们的第一种技术根据其个人的特定角色链接证书,并贪婪地选择具有最高总体相似性的证书对,同时还要考虑一对一和一对多链接约束。我们的第二种混合技术结合了图,组和时间链接,还考虑了个人和组之间的关系信息。我们将这些技术与最先进的组,集体和基于图的链接方法进行了比较。结果我们根据1861年至1901年涵盖斯凯岛人口的真实苏格兰数据评估了我们提出的技术。这些数据集总共包含234,365个人的119,042证书。作为基本事实,我们有一组由领域专家手动链接的记录的生命周期。我们的结果表明,与谨慎的手动链接相比,即使是先进的技术也难以实现较高的链接质量。造成这种情况的两个原因是,我们数据中的名称池非常小,而且人们的个人详细信息的性质随时间变化。但是,我们提出的这两种技术都大大优于传统的成对属性相似性和组链接方法,基于贪婪角色的技术比混合技术获得了更好的结果。结论我们对真实数据的实验表明,即使采用采用组,图,关系和时间方法的高级链接技术,要从跨越数十年的复杂数据(例如出生,死亡,结婚和人口普查证书)获得高质量链接也是具有挑战性的。在将来的工作中,我们将改进技术的所有步骤,以期开发出用于链接大型复杂人口数据库的高度准确,可扩展和自动的技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号