Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation

Helmut Spengler; Claudia Lang; Tanmaya Mahapatra; Ingrid Gatz; Klaus A Kuhn; Fabian Prasser

首页> 外文期刊>JMIR Medical Informatics >Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation

【24h】

Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation

机译：启用敏捷临床和翻译数据仓库：平台开发和评估

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Background Modern data-driven medical research provides new insights into the development and course of diseases and enables novel methods of clinical decision support. Clinical and translational data warehouses, such as Informatics for Integrating Biology and the Bedside (i2b2) and tranSMART, are important infrastructure components that provide users with unified access to the large heterogeneous data sets needed to realize this and support use cases such as cohort selection, hypothesis generation, and ad hoc data analysis. Objective Often, different warehousing platforms are needed to support different use cases and different types of data. Moreover, to achieve an optimal data representation within the target systems, specific domain knowledge is needed when designing data-loading processes. Consequently, informaticians need to work closely with clinicians and researchers in short iterations. This is a challenging task as installing and maintaining warehousing platforms can be complex and time consuming. Furthermore, data loading typically requires significant effort in terms of data preprocessing, cleansing, and restructuring. The platform described in this study aims to address these challenges. Methods We formulated system requirements to achieve agility in terms of platform management and data loading. The derived system architecture includes a cloud infrastructure with unified management interfaces for multiple warehouse platforms and a data-loading pipeline with a declarative configuration paradigm and meta-loading approach. The latter compiles data and configuration files into forms required by existing loading tools, thereby automating a wide range of data restructuring and cleansing tasks. We demonstrated the fulfillment of the requirements and the originality of our approach by an experimental evaluation and a comparison with previous work. Results The platform supports both i2b2 and tranSMART with built-in security. Our experiments showed that the loading pipeline accepts input data that cannot be loaded with existing tools without preprocessing. Moreover, it lowered efforts significantly, reducing the size of configuration files required by factors of up to 22 for tranSMART and 1135 for i2b2. The time required to perform the compilation process was roughly equivalent to the time required for actual data loading. Comparison with other tools showed that our solution was the only tool fulfilling all requirements. Conclusions Our platform significantly reduces the efforts required for managing clinical and translational warehouses and for loading data in various formats and structures, such as complex entity-attribute-value structures often found in laboratory data. Moreover, it facilitates the iterative refinement of data representations in the target platforms, as the required configuration files are very compact. The quantitative measurements presented are consistent with our experiences of significantly reduced efforts for building warehousing platforms in close cooperation with medical researchers. Both the cloud-based hosting infrastructure and the data-loading pipeline are available to the community as open source software with comprehensive documentation.

机译：背景技术现代数据驱动的医学研究为疾病的开发和疗程提供了新的见解，并实现了新的临床决策方法。临床和翻译数据仓库，例如集成生物学和床头柜（I2B2）和传输的信息学，是重要的基础设施组件，为用户提供对实现这一和支持诸如COHORT选择的案例所需的大型异构数据集的用户，假设生成，临时数据分析。客观通常，需要不同的仓储平台来支持不同的用例和不同类型的数据。此外，为了在目标系统内实现最佳数据表示，在设计数据加载过程时需要特定的域知识。因此，Informaticians需要与临床医生和研究人员密切合作。这是一个具有挑战性的任务，因为安装和维护仓储平台可能是复杂且耗时的。此外，数据负载通常需要在数据预处理，清洁和重组方面进行大量努力。本研究中描述的平台旨在解决这些挑战。方法制定了在平台管理和数据加载方面实现了系统要求，实现了灵活性。派生系统架构包括云基础架构，具有多个仓库平台的统一管理接口和具有声明配置范例和元加载方法的数据加载流水线。后者将数据和配置文件编译为现有加载工具所需的表单，从而自动化广泛的数据重组和清洁任务。我们展示了通过实验评估和与以前的工作的比较来满足我们的方法的要求和原创性。结果平台支持I2B2和随内置安全性的传输。我们的实验表明，加载流水线接受无法在没有预处理的情况下使用现有工具加载的输入数据。此外，它显着降低了努力，降低了对于I2B2的传输和1135的最多22个因素所需的配置文件的大小。执行编译过程所需的时间大致相当于实际数据加载所需的时间。与其他工具的比较表明，我们的解决方案是唯一满足所有要求的工具。结论我们的平台大大减少了管理临床和翻译仓库所需的努力，并以各种格式和结构加载数据，例如经常在实验室数据中发现的复杂实体属性值结构。此外，由于所需的配置文件非常紧凑，它有助于迭代数据表示中的数据表示。提出的定量测量与我们与医学研究人员密切合作的建立仓储平台的大大减少努力的经验一致。社区中的基于云的托管基础架构和数据加载管道都可以作为具有全面文档的开源软件。

著录项

来源
《JMIR Medical Informatics》 |2020年第7期|共页
作者
Helmut Spengler; Claudia Lang; Tanmaya Mahapatra; Ingrid Gatz; Klaus A Kuhn; Fabian Prasser;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Dockercohort selectiondata warehouseextract-transform-loadhostinghypothesis generationi2b2tranSMARTtranslational research;

机译：dockercohort selectiondata仓库百分子萃取 - 载loadhostinghyposhesigai2b2transmarttranslational研究;

相似文献

外文文献
中文文献
专利

1. Enabling a Learning Health System through a Unified Enterprise Data Warehouse: The Experience of the Northwestern University Clinical and Translational Sciences (NUCATS) Institute [J] . Starren Justin B., Winter Andrew Q., Lloyd-Jones Donald M. Clinical and translational science. . 2015,第4期

机译：通过统一的企业数据仓库启用学习健康系统：西北大学临床与转化科学（NUCATS）研究所的经验
2. OncDRS: An integrative clinical and genomic data platform for enabling translational research and precision medicine [J] . John Orechia, Ameet Pathak, Yunling Shi, Applied Translational Genomics . 2015,第3期

机译：OncDRS：整合的临床和基因组数据平台，可用于转化研究和精准医学
3. Agile values or plan-driven aspects: Which factor contributes more toward the success of data warehousing, business intelligence, and analytics project development? [J] . Batra Dinesh The Journal of Systems and Software . 2018,第DECa期

机译：敏捷价值或计划驱动的方面：哪个因素对数据仓库，商业智能和分析项目开发的成功做出更大贡献？
4. Agile Data Warehouse -- The Final Frontier: How a Data Warehouse Redevelopment Is Being Done in an Agile and Pragmatic Way [C] . Bunio Terry S. 2012 Agile Conference . 2012

机译：敏捷数据仓库-最后的领域：如何以敏捷和务实的方式完成数据仓库的重新开发
5. Development of a three-dimensional model for micrometastatic ovarian cancer: A translational research platform to rapidly evaluate mechanism-based combination treatments. [D] . Rizvi, Imran. 2010

机译：开发微转移性卵巢癌的三维模型：一个可快速评估基于机制的联合治疗的转化研究平台。
6. Enabling a Learning Health System through a Unified Enterprise Data Warehouse: The Experience of the Northwestern University Clinical and Translational Sciences (NUCATS) Institute [O] . Justin B. Starren, Andrew Q. Winter, Donald M. Lloyd‐Jones 2015

机译：通过统一的企业数据仓库启用学习健康系统：西北大学临床与转化科学（NUCATS）研究所的经验
7. OncDRS: An integrative clinical and genomic data platform for enabling translational research and precision medicine [O] . John Orechia, Ameet Pathak, Yunling Shi, 2015

机译：OncDRs：一个综合的临床和基因组数据平台，用于实现转化研究和精准医学

Enabling Agile Clinical and Translational Data Warehousing: Platform Development and Evaluation

摘要

著录项

相似文献

相关主题

期刊订阅