...
首页> 外文期刊>Statistics in medicine >Methods for analyzing data from probabilistic linkage strategies based on partially identifying variables
【24h】

Methods for analyzing data from probabilistic linkage strategies based on partially identifying variables

机译:基于部分识别变量的概率关联策略数据分析方法

获取原文
获取原文并翻译 | 示例

摘要

In record linkage studies, unique identifiers are often not available, and therefore, the linkage procedure depends on combinations of partially identifying variables with low discriminating power. As a consequence, wrongly linked covariate and outcome pairs will be created and bias further analysis of the linked data. In this article, we investigated two estimators that correct for linkage error in regression analysis. We extended the estimators developed by Lahiri and Larsen and also suggested a weighted least squares approach to deal with linkage error. We considered both linear and logistic regression problems and evaluated the performance of both methods with simulations. Our results show that all wrong covariate and outcome pairs need to be removed from the analysis in order to calculate unbiased regression coefficients in both approaches. This removal requires strong assumptions on the structure of the data. In addition, the bias significantly increases when the assumptions do not hold and wrongly linked records influence the coefficient estimation. Our simulations showed that both methods had similar performance in linear regression problems. With logistic regression problems, the weighted least squares method showed less bias. Because the specific structure of the data in record linkage problems often leads to different assumptions, itis necessary that the analyst has prior knowledge on the nature of the data. These assumptions are more easily introduced in the weighted least squares approach than in the Lahiri and Larsen estimator.
机译:在记录链接研究中,唯一的标识符通常不可用,因此,链接过程取决于具有较低区分能力的部分标识变量的组合。结果,将创建错误链接的协变量和结果对,并偏向对链接数据的进一步分析。在本文中,我们研究了两个估计量,这些估计量可以校正回归分析中的链接误差。我们扩展了Lahiri和Larsen开发的估计量,并提出了加权最小二乘法来处理链接误差。我们考虑了线性和逻辑回归问题,并通过仿真评估了这两种方法的性能。我们的结果表明,所有错误的协变量和结果对都需要从分析中删除,以便在两种方法中计算无偏回归系数。这种删除要求对数据的结构有很强的假设。此外,当假设不成立且记录错误连接会影响系数估计时,偏差会大大增加。我们的仿真表明,两种方法在线性回归问题上的性能相似。对于逻辑回归问题,加权最小二乘法显示的偏差较小。由于记录链接问题中数据的特定结构通常会导致不同的假设,因此有必要使分析人员具有有关数据性质的先验知识。这些假设在加权最小二乘法中比在Lahiri和Larsen估计器中更容易引入。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号