首页> 外文期刊>International Journal of Computer Science & Information Technology (IJCSIT) >A Meta Data Vault Approach for Evolutionary Integration of Big Data Sets : Case Study Using the NCBI Database for Genetic Variation
【24h】

A Meta Data Vault Approach for Evolutionary Integration of Big Data Sets : Case Study Using the NCBI Database for Genetic Variation

机译:大数据集进化集成的元数据仓库方法:使用NCBI数据库进行遗传变异的案例研究

获取原文
           

摘要

A data warehouse integrates data from various and heterogeneous data sources and creates aconsolidated view of the data that is optimized for reporting and analysis. Today, business andtechnology are constantly evolving, which directly affects the data sources. New data sources canemerge while some can become unavailable. The DW or the data mart that is based on these datasources needs to reflect these changes. Various solutions to adapt a data warehouse after the changesin the data sources and the business requirements have been proposed in the literature [1]. However,research in the problem of DW evolution has focused mainly on managing changes in the dimensionalmodel while other aspects related to the ETL, and maintaining the history of changes has not beenaddressed. The paper presents a Meta Data vault model that includes a data vault based datawarehouse and a master data management. A major area of focus in this research is to keep bothhistory of changes and a “single version of the truth,” through an MDM, integrated with the DW. Thepaper also outlines the load patterns used to load data into the data warehouse and materialized viewsto deliver data to end-users. To test the proposed model, we have used big data sets from the biomedicalfield and for each modification of the data source schema, we outline the changes that need to be madeto the EDW, the data marts and the ETL.
机译:数据仓库集成了来自各种异构数据源的数据,并创建了针对报告和分析而优化的数据合并视图。如今,业务和技术在不断发展,直接影响数据源。新数据源可能会涌现,而某些数据源将变得不可用。基于这些数据源的DW或数据集市需要反映这些更改。文献[1]中提出了各种在数据源和业务需求发生变化后适应数据仓库的解决方案。然而,关于DW演化问题的研究主要集中在管理尺寸模型的变化,而与ETL相关的其他方面以及保持变化的历史尚未得到解决。本文提出了一个元数据仓库模型,该模型包括一个基于数据仓库的数据仓库和一个主数据管理。该研究的主要重点是通过与DW集成的MDM来保持变化的历史和“真相的单一版本”。本文还概述了用于将数据加载到数据仓库和实例化视图以将数据交付给最终用户的加载模式。为了测试提出的模型,我们使用了生物医学领域的大数据集,并且对于数据源模式的每次修改,我们都概述了需要对EDW,数据集市和ETL进行的更改。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号