首页> 外文期刊>JMIR public health and surveillance. >Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data
【24h】

Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data

机译:没有通用卫生保健标识符的地方:使用人口统计数据对基于分数的人员匹配算法的效用进行比较和确定

获取原文
           

摘要

Background: A universal health care identifier (UHID) facilitates the development of longitudinal medical records in health care settings where follow up and tracking of persons across health care sectors are needed. HIV case-based surveillance (CBS) entails longitudinal follow up of HIV cases from diagnosis, linkage to care and treatment, and is recommended for second generation HIV surveillance. In the absence of a UHID, records matching, linking, and deduplication may be done using score-based persons matching algorithms. We present a stepwise process of score-based persons matching algorithms based on demographic data to improve HIV CBS and other longitudinal data systems. Objective: The aim of this study is to compare deterministic and score-based persons matching algorithms in records linkage and matching using demographic data in settings without a UHID. Methods: We used HIV CBS pilot data from 124 facilities in 2 high HIV-burden counties (Siaya and Kisumu) in western Kenya. For efficient processing, data were grouped into 3 scenarios within (1) HIV testing services (HTS), (2) HTS-care, and (3) within care. In deterministic matching, we directly compared identifiers and pseudo-identifiers from medical records to determine matches. We used R stringdist package for Jaro, Jaro-Winkler score-based matching and Levenshtein, and Damerau-Levenshtein string edit distance calculation methods. For the Jaro-Winkler method, we used a penalty (р)=0.1 and applied 4 weights (ω) to Levenshtein and Damerau-Levenshtein: deletion ω=0.8, insertion ω=0.8, substitutions ω=1, and transposition ω=0.5. Results: We ed 12,157 cases of which 4073/12,157 (33.5%) were from HTS, 1091/12,157 (9.0%) from HTS-care, and 6993/12,157 (57.5%) within care. Using the deterministic process 435/12,157 (3.6%) duplicate records were identified, yielding 96.4% (11,722/12,157) unique cases. Overall, of the score-based methods, Jaro-Winkler yielded the most duplicate records (686/12,157, 5.6%) while Jaro yielded the least duplicates (546/12,157, 4.5%), and Levenshtein and Damerau-Levenshtein yielded 4.6% (563/12,157) duplicates. Specifically, duplicate records yielded by method were: (1) Jaro 5.7% (234/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.4% (308/6993) within care, (2) Jaro-Winkler 7.4% (302/4073) within HTS, 0.5% (6/1091) in HTS-care, and 5.4% (378/6993) within care, (3) Levenshtein 6.4% (262/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.2% (297/6993) within care, and (4) Damerau-Levenshtein 6.4% (262/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.2% (297/6993) within care. Conclusions: Without deduplication, over reporting occurs across the care and treatment cascade. Jaro-Winkler score-based matching performed the best in identifying matches. A pragmatic estimate of duplicates in health care settings can provide a corrective factor for modeled estimates, for targeting and program planning. We propose that even without a UHID, standard national deduplication and persons-matching algorithm that utilizes demographic data would improve accuracy in monitoring HIV care clinical cascades.
机译:背景:通用医疗保健标识符(UHID)有助于在医疗保健机构中建立纵向医疗记录,在医疗机构中需要对各个医疗保健部门的人员进行跟进和跟踪。基于艾滋病毒病例的监视(CBS)需要对艾滋病病例从诊断,联系到护理和治疗进行纵向跟进,因此建议用于第二代艾滋病毒监视。在没有UHID的情况下,可以使用基于分数的人员匹配算法来完成记录匹配,链接和重复数据删除。我们提出了基于人口统计数据的基于分数的人员匹配算法的逐步过程,以改善HIV CBS和其他纵向数据系统。目的:本研究的目的是在不使用UHID的情况下,比较人口统计数据在记录链接和匹配中的确定性和基于分数的人员匹配算法。方法:我们使用了来自肯尼亚西部两个艾滋病高负担县(Siaya和Kisumu)的124个机构的HIV CBS试点数据。为了进行有效处理,将数据分为(1)HIV检测服务(HTS),(2)HTS护理和(3)护理内的3种情况。在确定性匹配中,我们直接比较了病历中的标识符和伪标识符来确定匹配项。我们将R stringdist包用于Jaro,基于Jaro-Winkler得分的匹配和Levenshtein,以及Damerau-Levenshtein字符串编辑距离计算方法。对于Jaro-Winkler方法,我们使用罚分(р)= 0.1并对Levenshtein和Damerau-Levenshtein施加4个权重(ω):删除ω= 0.8,插入ω= 0.8,替换ω= 1和换位ω= 0.5 。结果:我们调查了12,157例病例,其中来自HTS的4073 / 12,157(33.5%),来自HTS护理的1091 / 12,157(9.0%)和处于护理中的6993 / 12,157(57.5%)。使用确定性过程,确定了435 / 12,157(3.6%)个重复记录,从而产生了96.4%(11,722 / 12,157)个独特案例。总体而言,在基于得分的方法中,Jaro-Winkler产生了最多的重复记录(686 / 12,157,5.6%),而Jaro产生了最少的重复记录(546 / 12,157,4.5%),Levenshtein和Damerau-Levenshtein产生了4.6%( 563 / 12,157)重复。具体而言,通过该方法得出的重复记录是:(1)HTS内的Jaro 5.7%(234/4073),HTS-care内0.4%(4/1091)和护理内4.4%(308/6993),(2)Jaro -在HTS内,Winkler占7.4%(302/4073),在HTS内,占0.5%(6/1091),在护理内,占5.4%(378/6993),(3)LTS在HTS内占6.4%(262/4073),0.4 HTS护理中的百分比(4/1091),护理中的4.2%(297/6993),以及(4)HTS内的Damerau-Levenshtein 6.4%(262/4073),HTS护理中的0.4%(4/1091) ,而在护理范围内为4.2%(297/6993)。结论:如果没有重复数据删除,则整个护理和治疗级联都会出现过多报告的情况。 Jaro-Winkler基于分数的匹配在识别匹配中表现最好。在卫生保健机构中对重复项目进行务实的评估可以为模型化评估,目标制定和计划规划提供纠正因素。我们建议,即使没有UHID,使用人口统计数据的标准国家重复数据删除和人员匹配算法也可以提高监测HIV护理临床级联的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号