...
首页> 外文期刊>Journal of Economic Literature >How Well Do Automated Linking Methods Perform? Lessons from US Historical Data
【24h】

How Well Do Automated Linking Methods Perform? Lessons from US Historical Data

机译:自动链接方法如何执行如何?来自美国历史数据的课程

获取原文
获取原文并翻译 | 示例

摘要

This paper reviews the literature in historical record linkage in the United States and examines the performance of widely used record-linking algorithms and common variations in their assumptions. We use two high-quality, hand-linked data sets and one synthetic ground truth to examine the direct effects of linking algorithms on data quality. We find that (ⅰ) no algorithm (including hand linking) consistently produces representative samples; (ⅱ) 15 to 37 percent of links chosen by widely used algorithms are classified as errors by trained human reviewers; and (ⅲ) false links are systematically related to baseline sample characteristics, showing that some algorithms may introduce systematic measurement error into analyses. A case study shows that the combined effects of (ⅰ)-(ⅲ) attenuate estimates of the intergenerational income elasticity by up to 29 percent, and common variations in algorithm assumptions result in greater attenuation. As current practice moves to automate linking and increase link rates, these results highlight the important potential consequences of linking errors on inferences with linked data. We conclude with constructive suggestions for reducing linking errors and directions for future research.
机译:本文审查了美国历史记录联系中的文献,并审查了广泛使用的记录链接算法的表现和其假设的常见变化。我们使用两组高质量,手动链接数据集和一个合成的基础事实来检查链接算法对数据质量的直接影响。我们发现(Ⅰ)没有算法(包括手连接)一致地产生代表性样本; (Ⅱ)广泛使用的算法选择的环节中的15%至37%被培训的人类评论者归类为错误; (Ⅲ)虚假链接系统地与基线样本特征有关,表明某些算法可能会引入系统的测量误差。案例研究表明,(Ⅰ) - (Ⅲ)衰减估算估算估算估计至29%,算法假设的常见变化导致​​更大的衰减。由于目前的实践移动以自动化链接和提高链路速率,因此这些结果突出了链接数据链接误差的重要潜在后果。我们得出结论,建设性建议,以减少对未来研究的联系错误和方向。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号