...
首页> 外文期刊>BMC Medical Informatics and Decision Making >Development and validation of data quality rules in administrative health data using association rule mining
【24h】

Development and validation of data quality rules in administrative health data using association rule mining

机译:使用关联规则挖掘的行政健康数据中数据质量规则的开发和验证

获取原文
           

摘要

Data quality assessment presents a challenge for research using coded administrative health data. The objective of this study is to develop and validate a set of coding association rules for coded diagnostic data. We used the Canadian re-abstracted hospital discharge abstract data coded in International Classification of Disease, 10th revision (ICD-10) codes. Association rule mining was conducted on the re-abstracted data in four age groups (0–4, 20–44, 45–64; ≥ 65) to extract ICD-10 coding association rules at the three-digit (category of diagnosis) and four-digit levels (category of diagnosis with etiology, anatomy, or severity). The rules were reviewed by a panel of 5 physicians and 2 classification specialists using a modified Delphi rating process. We proposed and defined the variance and bias to assess data quality using the rules. After the rule mining process and the panel review, 388 rules at the three-digit level and 275 rules at the four-digit level were developed. Half of the rules were from the age group of ≥65. Rules captured meaningful age-specific clinical associations, with rules at the age group of ≥65 being more complex and comprehensive than other age groups. The variance and bias can identify rules with high bias and variance in Alberta data and provides directions for quality improvement. A set of ICD-10 data quality rules were developed and validated by a clinical and classification expert panel. The rules can be used as a tool to assess ICD-coded data, enabling the monitoring and comparison of data quality across institutions, provinces, and countries.
机译:数据质量评估对使用编码的管理健康数据进行研究呈现挑战。本研究的目的是开发和验证用于编码诊断数据的一组编码关联规则。我们使用加拿大重新抽象的医院排放摘要数据,编码了国际疾病的国际分类,第10次修订(ICD-10)代码。在四个年龄组(0-4,20-44,45-64;≥65)的重新抽象数据上进行了关联规则挖掘,以提取在三位数(诊断类别)和诊断中的ICD-10编码关联规则和四位数水平(诊断等类别,病因,解剖或严重程度)。使用修改的Delphi评级过程,由5名医生和2个分类专家审查规则。我们提出并定义了使用规则评估数据质量的方差和偏差。在规则采矿过程和面板审查之后,开发了三位数级别的388个规则和四位数水平的275条规则。一半的规则来自年龄组≥65。规则捕获了有意义的年龄特异性临床协会,年龄组的规则≥65年龄较为复杂和全面,而不是其他年龄组。方差和偏差可以识别艾伯塔数据中具有高偏差和方差的规则,并提供质量改进的方向。由临床和分类专家组开发和验证了一组ICD-10数据质量规则。该规则可以用作评估ICD编码数据的工具,从而实现跨机构,省市和国家的数据质量的监测和比较。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号