首页> 外文期刊>JMIR Medical Informatics >Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation
【24h】

Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation

机译:启用敏捷临床和翻译数据仓库:平台开发和评估

获取原文
           

摘要

Background Modern data-driven medical research provides new insights into the development and course of diseases and enables novel methods of clinical decision support. Clinical and translational data warehouses, such as Informatics for Integrating Biology and the Bedside (i2b2) and tranSMART, are important infrastructure components that provide users with unified access to the large heterogeneous data sets needed to realize this and support use cases such as cohort selection, hypothesis generation, and ad hoc data analysis. Objective Often, different warehousing platforms are needed to support different use cases and different types of data. Moreover, to achieve an optimal data representation within the target systems, specific domain knowledge is needed when designing data-loading processes. Consequently, informaticians need to work closely with clinicians and researchers in short iterations. This is a challenging task as installing and maintaining warehousing platforms can be complex and time consuming. Furthermore, data loading typically requires significant effort in terms of data preprocessing, cleansing, and restructuring. The platform described in this study aims to address these challenges. Methods We formulated system requirements to achieve agility in terms of platform management and data loading. The derived system architecture includes a cloud infrastructure with unified management interfaces for multiple warehouse platforms and a data-loading pipeline with a declarative configuration paradigm and meta-loading approach. The latter compiles data and configuration files into forms required by existing loading tools, thereby automating a wide range of data restructuring and cleansing tasks. We demonstrated the fulfillment of the requirements and the originality of our approach by an experimental evaluation and a comparison with previous work. Results The platform supports both i2b2 and tranSMART with built-in security. Our experiments showed that the loading pipeline accepts input data that cannot be loaded with existing tools without preprocessing. Moreover, it lowered efforts significantly, reducing the size of configuration files required by factors of up to 22 for tranSMART and 1135 for i2b2. The time required to perform the compilation process was roughly equivalent to the time required for actual data loading. Comparison with other tools showed that our solution was the only tool fulfilling all requirements. Conclusions Our platform significantly reduces the efforts required for managing clinical and translational warehouses and for loading data in various formats and structures, such as complex entity-attribute-value structures often found in laboratory data. Moreover, it facilitates the iterative refinement of data representations in the target platforms, as the required configuration files are very compact. The quantitative measurements presented are consistent with our experiences of significantly reduced efforts for building warehousing platforms in close cooperation with medical researchers. Both the cloud-based hosting infrastructure and the data-loading pipeline are available to the community as open source software with comprehensive documentation.
机译:背景技术现代数据驱动的医学研究为疾病的开发和疗程提供了新的见解,并实现了新的临床决策方法。临床和翻译数据仓库,例如集成生物学和床头柜(I2B2)和传输的信息学,是重要的基础设施组件,为用户提供对实现这一和支持诸如COHORT选择的案例所需的大型异构数据集的用户,假设生成,临时数据分析。客观通常,需要不同的仓储平台来支持不同的用例和不同类型的数据。此外,为了在目标系统内实现最佳数据表示,在设计数据加载过程时需要特定的域知识。因此,Informaticians需要与临床医生和研究人员密切合作。这是一个具有挑战性的任务,因为安装和维护仓储平台可能是复杂且耗时的。此外,数据负载通常需要在数据预处理,清洁和重组方面进行大量努力。本研究中描述的平台旨在解决这些挑战。方法制定了在平台管理和数据加载方面实现了系统要求,实现了灵活性。派生系统架构包括云基础架构,具有多个仓库平台的统一管理接口和具有声明配置范例和元加载方法的数据加载流水线。后者将数据和配置文件编译为现有加载工具所需的表单,从而自动化广泛的数据重组和清洁任务。我们展示了通过实验评估和与以前的工作的比较来满足我们的方法的要求和原创性。结果平台支持I2B2和随内置安全性的传输。我们的实验表明,加载流水线接受无法在没有预处理的情况下使用现有工具加载的输入数据。此外,它显着降低了努力,降低了对于I2B2的传输和1135的最多22个因素所需的配置文件的大小。执行编译过程所需的时间大致相当于实际数据加载所需的时间。与其他工具的比较表明,我们的解决方案是唯一满足所有要求的工具。结论我们的平台大大减少了管理临床和翻译仓库所需的努力,并以各种格式和结构加载数据,例如经常在实验室数据中发现的复杂实体属性值结构。此外,由于所需的配置文件非常紧凑,它有助于迭代数据表示中的数据表示。提出的定量测量与我们与医学研究人员密切合作的建立仓储平台的大大减少努力的经验一致。社区中的基于云的托管基础​​架构和数据加载管道都可以作为具有全面文档的开源软件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号