...
首页> 外文期刊>BMC Medical Informatics and Decision Making >Text data extraction for a prospective, research-focused data mart: implementation and validation
【24h】

Text data extraction for a prospective, research-focused data mart: implementation and validation

机译:文本数据提取,用于预期的,以研究为中心的数据集市:实施和验证

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Background Translational research typically requires data abstracted from medical records as well as data collected specifically for research. Unfortunately, many data within electronic health records are represented as text that is not amenable to aggregation for analyses. We present a scalable open source SQL Server Integration Services package, called Regextractor, for including regular expression parsers into a classic extract, transform, and load workflow. We have used Regextractor to discrete data from textual reports from a number of ‘machine generated’ sources. To validate this package, we created a pulmonary function test data mart and analyzed the quality of the data mart versus manual chart review. Methods Eleven variables from pulmonary function tests performed closest to the initial clinical evaluation date were studied for 100 randomly selected subjects with scleroderma. One research assistant manually reviewed, abstracted, and entered relevant data into a database. Correlation with data obtained from the automated pulmonary function test data mart within the Northwestern Medical Enterprise Data Warehouse was determined. Results There was a near perfect (99.5%) agreement between results generated from the Regextractor package and those obtained via manual chart abstraction. The pulmonary function test data mart has been used subsequently to monitor disease progression of patients in the Northwestern Scleroderma Registry. In addition to the pulmonary function test example presented in this manuscript, the Regextractor package has been used to create cardiac catheterization and echocardiography data marts. The Regextractor package was released as open source software in October 2009 and has been downloaded 552 times as of 6/1/2012. Conclusions Collaboration between clinical researchers and biomedical informatics experts enabled the development and validation of a tool (Regextractor) to parse, and assemble structured data from text data contained in the electronic health record. Regextractor has been successfully used to create additional data marts in other medical domains and is available to the public.
机译:背景技术转化研究通常需要从病历中提取数据以及专门为研究而收集的数据。不幸的是,电子健康记录中的许多数据被表示为不适合汇总分析的文本。我们提供了一个可扩展的开源SQL Server集成服务包,称为Regextractor,用于将正则表达式解析器包含到经典的提取,转换和加载工作流程中。我们已经使用Regextractor从许多“机器生成”源的文本报告中分离数据。为了验证此软件包,我们创建了一个肺功能测试数据集市,并分析了数据集市与手动图表审查的质量。方法对随机选择的100例硬皮病患者的肺功能测试中的11个变量进行了研究,这些变量最接近最初的临床评估日期。一名研究助理手动审查,提取摘要并将相关数据输入数据库。确定了与从西北医疗企业数据仓库内的自动肺功能测试数据集市获得的数据的相关性。结果从Regextractor软件包生成的结果与通过手动图表抽象获得的结果之间几乎完美(99.5%)的一致性。肺功能测试数据集市随后被用于监视西北硬皮病登记处的患者疾病进展。除了本手稿中的肺功能测试示例外,Regextractor软件包还用于创建心脏导管插入术和超声心动图数据集市。 Regextractor软件包于2009年10月作为开源软件发布,截至2012年6月1日已下载552次。结论临床研究人员和生物医学信息学专家之间的合作使开发和验证一种工具(Regextractor)可以解析和组合电子健康记录中包含的文本数据的结构化数据。 Regextractor已成功用于在其他医学领域中创建其他数据集市,并且对公众开放。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号