...
首页> 外文期刊>International Journal of Population Data Science >Bringing more science to the art of linkage: Using positive predictive value of weight and outcome sets to reduce subjectivity in probabilistic linkage
【24h】

Bringing more science to the art of linkage: Using positive predictive value of weight and outcome sets to reduce subjectivity in probabilistic linkage

机译:将更多科学应用于链接技术:使用权重和结果集的正预测值来降低概率链接的主观性

获取原文
           

摘要

ABSTRACT ObjectiveCurrently, a probabilistic linkage is performed by our organization with final linkage classification established by users with expert knowledge applying rules referencing weight and comparison outcome sets. The particular classification results in a perceived comprehensive linkage. Acknowledged weaknesses are variation in expert knowledge and its application. Also, consideration of expert rules is often time-consuming. We piloted a new approach, involving a file level summary of “positive predictive value” for weights and outcome sets. We contrast the new approach with the previous one and identify strengths and weaknesses. ApproachWe resolve linkages using two different approaches, the existing method that expert users apply rules to weight and comparison outcomes, and a recent one using positive predictive values (PPV) that reduce resolution subjectivity. The new method produces summary true positive, false positive and positive predictive values for each weight and outcome set within a file of candidate pairs above the cutoff. The top-weighted pair increments the true positive (TP) for the weight and outcome set on the record. All other candidate pairs increment false positive counts (FP). At the end of the file, PPV is calculated for each weight and outcome as TP/(TP+FP). Additionally, a .9 PPV weight threshold is established from the summary excluding weights with less than N occurrences. The approach presumes successful link Accepted links include top-ranked records whose weight is greater than the .9 PPV threshold and top candidates with an outcome summary PPV = .9, excluding outcome sets with less than N occurrences. ResultsIn the pilot, the new method produced linkage results near par with the previously employed methods. Importantly, they establish accepted links by a consistent methodology, allowing for increased standardization. The method eases identification of a threshold weight and referencing summary comparison outcome PPVs identifies additional confident links in larger population-based linkages which a single weight threshold may exclude. Components require minimal tuning to data characteristics, error tolerance, and result expectations. The new method is unable to identify links established in the legacy method by additional processing or manual review. Further review of the approach as applied to other datasets is needed.
机译:摘要目标当前,我们的组织执行概率链接,而最终链接分类是由用户建立的,该分类由用户根据引用权重和比较结果集的规则应用专家知识建立。特定的分类导致感知的全面联系。公认的弱点是专家知识及其应用的变异。而且,考虑专家规则通常很耗时。我们试用了一种新方法,涉及权重和结果集的“阳性预测值”的文件级摘要。我们将新方法与以前的方法进行了对比,并确定了优点和缺点。方法我们使用两种不同的方法解决链接,一种是专家用户将规则应用于权重和比较结果的现有方法,另一种是使用正预测值(PPV)来降低分辨率主观性的方法。新方法为截止范围上方的候选对文件中的每个权重和结果集产生汇总的真阳性,假阳性和阳性预测值。权重最高的对将增加记录上设置的权重和结果的真实阳性(TP)。所有其他候选对都会增加假阳性计数(FP)。在文件末尾,针对每个体重和结果计算的PPV为TP /(TP + FP)。此外,从摘要中建立了.9 PPV权重阈值,不包括出现次数少于N的权重。该方法假定成功链接接受的链接包括权重大于.9 PPV阈值的排名最高的记录以及结果摘要PPV> = .9的排名靠前的候选,不包括出现次数少于N的结果集。结果在试验中,新方法产生的连接结果与以前采用的方法几乎相同。重要的是,它们通过一致的方法建立了可接受的链接,从而提高了标准化程度。该方法简化了阈值权重的识别,并且参考汇总比较结果PPV可以识别较大的基于人口的链接中的其他置信链接,单个权重阈值可能会排除这些链接。组件需要对数据特征,容错能力和结果期望进行最小的调整。新方法无法通过其他处理或手动检查来识别在旧方法中建立的链接。需要对应用于其他数据集的方法进行进一步审查。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号