首页> 外文会议>MEDINFO >The Impact of a Growing Minority Population on Identification of Duplicate Records in an Enterprise Data Warehouse
【24h】

The Impact of a Growing Minority Population on Identification of Duplicate Records in an Enterprise Data Warehouse

机译:少数民族人群对企业数据仓库中重复记录的影响

获取原文

摘要

Patient medical records are often fragmented across disparate healthcare databases, potentially resulting in duplicate records that may be detrimental to health care services. These duplicate records can be found through a process called record linkage. This paper describes a set of duplicate records in a medical data warehouse found by linking to an external resource containing family history and vital records. Our objective was to investigate the impact database characteristics and linkage methods have on identifying duplicate records using an external resource. Frequency counts were made for demographic field values and compared between the set of duplicate records, the data warehouse, and the external resource. Considerations for understanding the relationship that records labeled as duplicates have with dataset characteristics and linkage methods were identified. Several noticeable patterns were identified where frequency counts between sets deviated from what was expected including how the growth of a minority population affected which records were identified as duplicates. Record linkage is a complex process where results can be affected by subtleties in data characteristics, changes in data trends, and reliance on external data sources. These changes should be taken into account to ensure any anomalies in results describe real effects and are not artifacts caused by datasets or linkage methods. This paper describes how frequency count analysis can be an effective way to detect and resolve anomalies in linkage results and how external resources that provide additional contextual information can prove useful in discovering duplicate records.
机译:患者的医疗记录通常在不同的医疗保健数据库中分散,可能导致对医疗保健服务有害的重复记录。可以通过名为Record Lonsing的过程找到这些重复记录。本文介绍了通过链接到包含家庭历史和重要记录的外部资源的医疗数据仓库中的一组重复记录。我们的目标是调查影响数据库特征和联动方法在使用外部资源识别重复记录。为人口统计字段值进行频率计数,并在一组重复记录,数据仓库和外部资源之间进行比较。识别用于理解标记为重复项的关系具有数据集特征和链接方法的关系的考虑。确定了几种明显的模式,其中偏离预期的频率计数,包括少数人群的生长如何影响哪些记录被确定为重复项。 CROMET LINKAGE是一个复杂的过程,结果可以受到数据特征的微妙之处的影响,数据趋势的变化以及对外部数据源的依赖性的影响。应考虑这些更改,以确保结果中的任何异常描述了实际效果,并且不是由数据集或联动方法引起的工件。本文介绍了频率计数分析如何是检测和解析链接结果中的异常的有效方法以及提供其他上下文信息的外部资源如何证明在发现重复记录中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号