首页> 外文期刊>BMC Genomics >A digital repository with an extensible data model for biobanking and genomic analysis management
【24h】

A digital repository with an extensible data model for biobanking and genomic analysis management

机译:具有可扩展数据模型的数字存储库,用于生物存储和基因组分析管理

获取原文
           

摘要

MotivationMolecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management.ResultsWe developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing.ConclusionsOur data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid.
机译:动机分子生物学实验室需要大量的元数据来改善数据收集和分析。随着研究的发展,国际间跨学科合作以及机构间数据共享的增加,收集到的元数据的异质性也在增长。单一标准化是不可行的,对于像现代集成生物库管理那样的具有灵活和可扩展的数据模型的数字存储库的开发至关重要。结果我们开发了一种JSON格式的新颖数据模型来描述通用生物医学场景中的异构数据。该模型基于两个分层实体:过程和事件,大致对应于单个研究中的研究和分析步骤。可以在建立分层结构的过程中对许多顺序事件进行分组,以跟踪患者和样本的历史记录。每个事件都可以产生新数据。数据由一组用户定义的元数据描述,并且可以具有一个或多个关联文件。我们将模型与基于数据的网格存储在基于Web的数字存储库中集成在一起,以管理位于不同地理位置的大型数据集。我们构建了图形界面,允许授权用户根据他们的要求动态定义新的数据类型。操作员使用灵活的搜索界面在元数据字段上进行查询,然后在数据库和网格上运行查询。我们将数字存储库应用于BIT-Gaslini生物库中样品,患者和病史的综合管理。该平台目前管理900多个患者的1800个样本。来自150个分析的微阵列数据存储在网格存储中,并复制到两个物理资源上以进行保存。该系统具备与其他生物库的数据集成功能,可以在全球范围内共享信息。结论我们的数据模型使用户能够连续定义灵活,临时和松散结构的元数据,以便在特定研究项目和目的中共享信息。这种方法可以改善敏感的跨学科研究合作,并允许跟踪患者的临床记录,样品管理信息和基因组数据。 Web界面使操作员可以轻松管理,查询和注释文件,而无需处理数据网格的技术问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号