Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data

Anthony Waruru; Agnes Natukunda; Lilly M Nyagah; Timothy A Kellogg; Emily Zielinski-Gutierrez; Wanjiru Waruiru; Kenneth Masamaro; Richelle Harklerode; Jacob Odhiambo; Eric-Jan Manders; Peter W Young

首页> 外文期刊>JMIR public health and surveillance. >Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data

【24h】

Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data

机译：没有通用卫生保健标识符的地方：使用人口统计数据对基于分数的人员匹配算法的效用进行比较和确定

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background: A universal health care identifier (UHID) facilitates the development of longitudinal medical records in health care settings where follow up and tracking of persons across health care sectors are needed. HIV case-based surveillance (CBS) entails longitudinal follow up of HIV cases from diagnosis, linkage to care and treatment, and is recommended for second generation HIV surveillance. In the absence of a UHID, records matching, linking, and deduplication may be done using score-based persons matching algorithms. We present a stepwise process of score-based persons matching algorithms based on demographic data to improve HIV CBS and other longitudinal data systems. Objective: The aim of this study is to compare deterministic and score-based persons matching algorithms in records linkage and matching using demographic data in settings without a UHID. Methods: We used HIV CBS pilot data from 124 facilities in 2 high HIV-burden counties (Siaya and Kisumu) in western Kenya. For efficient processing, data were grouped into 3 scenarios within (1) HIV testing services (HTS), (2) HTS-care, and (3) within care. In deterministic matching, we directly compared identifiers and pseudo-identifiers from medical records to determine matches. We used R stringdist package for Jaro, Jaro-Winkler score-based matching and Levenshtein, and Damerau-Levenshtein string edit distance calculation methods. For the Jaro-Winkler method, we used a penalty (р)=0.1 and applied 4 weights (ω) to Levenshtein and Damerau-Levenshtein: deletion ω=0.8, insertion ω=0.8, substitutions ω=1, and transposition ω=0.5. Results: We ed 12,157 cases of which 4073/12,157 (33.5%) were from HTS, 1091/12,157 (9.0%) from HTS-care, and 6993/12,157 (57.5%) within care. Using the deterministic process 435/12,157 (3.6%) duplicate records were identified, yielding 96.4% (11,722/12,157) unique cases. Overall, of the score-based methods, Jaro-Winkler yielded the most duplicate records (686/12,157, 5.6%) while Jaro yielded the least duplicates (546/12,157, 4.5%), and Levenshtein and Damerau-Levenshtein yielded 4.6% (563/12,157) duplicates. Specifically, duplicate records yielded by method were: (1) Jaro 5.7% (234/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.4% (308/6993) within care, (2) Jaro-Winkler 7.4% (302/4073) within HTS, 0.5% (6/1091) in HTS-care, and 5.4% (378/6993) within care, (3) Levenshtein 6.4% (262/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.2% (297/6993) within care, and (4) Damerau-Levenshtein 6.4% (262/4073) within HTS, 0.4% (4/1091) in HTS-care, and 4.2% (297/6993) within care. Conclusions: Without deduplication, over reporting occurs across the care and treatment cascade. Jaro-Winkler score-based matching performed the best in identifying matches. A pragmatic estimate of duplicates in health care settings can provide a corrective factor for modeled estimates, for targeting and program planning. We propose that even without a UHID, standard national deduplication and persons-matching algorithm that utilizes demographic data would improve accuracy in monitoring HIV care clinical cascades.

机译：背景：通用医疗保健标识符（UHID）有助于在医疗保健机构中建立纵向医疗记录，在医疗机构中需要对各个医疗保健部门的人员进行跟进和跟踪。基于艾滋病毒病例的监视（CBS）需要对艾滋病病例从诊断，联系到护理和治疗进行纵向跟进，因此建议用于第二代艾滋病毒监视。在没有UHID的情况下，可以使用基于分数的人员匹配算法来完成记录匹配，链接和重复数据删除。我们提出了基于人口统计数据的基于分数的人员匹配算法的逐步过程，以改善HIV CBS和其他纵向数据系统。目的：本研究的目的是在不使用UHID的情况下，比较人口统计数据在记录链接和匹配中的确定性和基于分数的人员匹配算法。方法：我们使用了来自肯尼亚西部两个艾滋病高负担县（Siaya和Kisumu）的124个机构的HIV CBS试点数据。为了进行有效处理，将数据分为（1）HIV检测服务（HTS），（2）HTS护理和（3）护理内的3种情况。在确定性匹配中，我们直接比较了病历中的标识符和伪标识符来确定匹配项。我们将R stringdist包用于Jaro，基于Jaro-Winkler得分的匹配和Levenshtein，以及Damerau-Levenshtein字符串编辑距离计算方法。对于Jaro-Winkler方法，我们使用罚分（р）= 0.1并对Levenshtein和Damerau-Levenshtein施加4个权重（ω）：删除ω= 0.8，插入ω= 0.8，替换ω= 1和换位ω= 0.5 。结果：我们调查了12,157例病例，其中来自HTS的4073 / 12,157（33.5％），来自HTS护理的1091 / 12,157（9.0％）和处于护理中的6993 / 12,157（57.5％）。使用确定性过程，确定了435 / 12,157（3.6％）个重复记录，从而产生了96.4％（11,722 / 12,157）个独特案例。总体而言，在基于得分的方法中，Jaro-Winkler产生了最多的重复记录（686 / 12,157，5.6％），而Jaro产生了最少的重复记录（546 / 12,157，4.5％），Levenshtein和Damerau-Levenshtein产生了4.6％（ 563 / 12,157）重复。具体而言，通过该方法得出的重复记录是：（1）HTS内的Jaro 5.7％（234/4073），HTS-care内0.4％（4/1091）和护理内4.4％（308/6993），（2）Jaro -在HTS内，Winkler占7.4％（302/4073），在HTS内，占0.5％（6/1091），在护理内，占5.4％（378/6993），（3）LTS在HTS内占6.4％（262/4073），0.4 HTS护理中的百分比（4/1091），护理中的4.2％（297/6993），以及（4）HTS内的Damerau-Levenshtein 6.4％（262/4073），HTS护理中的0.4％（4/1091），而在护理范围内为4.2％（297/6993）。结论：如果没有重复数据删除，则整个护理和治疗级联都会出现过多报告的情况。 Jaro-Winkler基于分数的匹配在识别匹配中表现最好。在卫生保健机构中对重复项目进行务实的评估可以为模型化评估，目标制定和计划规划提供纠正因素。我们建议，即使没有UHID，使用人口统计数据的标准国家重复数据删除和人员匹配算法也可以提高监测HIV护理临床级联的准确性。

著录项

来源
《JMIR public health and surveillance.》 |2018年第4期|共页
作者
Anthony Waruru; Agnes Natukunda; Lilly M Nyagah; Timothy A Kellogg; Emily Zielinski-Gutierrez; Wanjiru Waruiru; Kenneth Masamaro; Richelle Harklerode; Jacob Odhiambo; Eric-Jan Manders; Peter W Young;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类保健组织与事业（卫生事业管理）;
关键词
deterministic matchingscore-based matchingHIV case-based surveillanceunique case identificationuniversal health care identifier;

机译：确定性匹配基于得分的匹配基于艾滋病毒病例的监测唯一病例识别通用卫生保健标识符;

相似文献

外文文献
中文文献
专利

1. Implementing data-to-care initiatives for HIV in NewYork state: assessing the value of community health centers identifying persons out of care for health department follow-up [J] . Hart-Malloy Rachel, Brown Shakara, Bogucki Kathleen, AIDS care. . 2018,第3期

机译：在纽约州实施艾滋病毒委员会的数据 - 关注倡议：评估社区保健中心的价值，确定卫生部门的护理人员
2. Intra-database validation of case-identifying algorithms using reconstituted electronic health records from healthcare claims data [J] . Nicolas H. Thurin, Pauline Bosco-Levy, Patrick Blin, BMC Medical Research Methodology . 2021,第1期

机译：使用来自医疗保健声明数据的重构电子健康记录的数据库内验证案例识别算法
3. Universal health care and equity: evidence of maternal health based on an analysis of demographic and household survey data [J] . Sarah Neal, Andrew Amos Channon, Sarah Carter, International journal for equity in health . 2015,第3期

机译：全民医疗保健和公平：基于人口和家庭调查数据的分析的产妇健康证据
4. Metadata Enriched XML Supporting HIPAA Compliant Healthcare and Human Service Person Matching For Information Retrieval and Integration [C] . Emmett Davis, Bonnie H. Bennett, Dale Panton International Conference on Internet Computing IC'2001 Vol.1, Jun 25-28, 2001, Las Vegas, Nevada, USA . 2001

机译：支持HIPAA的医疗保健和人类服务人员匹配的丰富元数据XML支持信息检索和集成
5. How might universal healthcare insurance increase preventive care use in the United States? A comparison of the United States and Taiwan [D] . Hsiou, Tiffany Raetine 2010

机译：全民医疗保险如何在美国增加预防性护理的使用？美国和台湾的比较
6. Intra-database validation of case-identifying algorithms using reconstituted electronic health records from healthcare claims data [O] . Nicolas H. Thurin, Pauline Bosco-Levy, Patrick Blin, 2021

机译：使用来自医疗保健声明数据的重构电子健康记录的案例识别算法的数据库内验证
7. Intra-database validation of case-identifying algorithms using reconstituted electronic health records from healthcare claims data [O] . Nicolas H. Thurin, Pauline Bosco-Levy, Patrick Blin, 2021

机译：使用来自医疗保健声明数据的重构电子健康记录的案例识别算法的数据库内验证

Where No Universal Health Care Identifier Exists: Comparison and Determination of the Utility of Score-Based Persons Matching Algorithms Using Demographic Data

摘要

著录项

相似文献

相关主题

期刊订阅