首页> 外文期刊>Information Systems >Distributed snapshot maintenance in wide-column NoSQL databases using partitioned incremental ETL pipelines
【24h】

Distributed snapshot maintenance in wide-column NoSQL databases using partitioned incremental ETL pipelines

机译:使用分区增量ETL管道在大列NoSQL数据库中进行分布式快照维护

获取原文
获取原文并翻译 | 示例
           

摘要

Wide-column NoSQL databases are an important class of NoSQL (Not only SQL) databases which scale horizontally and feature high access performance on sparse tables. With current trends towards big Data Warehouses (DWs), it is attractive to run existing business intelligence/data warehousing applications on higher volumes of data in wide-column NoSQL databases for low latency by mapping multidimensional models to wide-column NoSQL models or using additional SQL add-ons. For examples, applications like retail management can run over integrated data sets stored in big DWs or in the cloud to capture current item-selling trends. Many of these systems also employ Snapshot Isolation (SI) as a concurrency control mechanism to achieve high throughput for read-heavy workloads. SI works well in a DW environment, as analytical queries can now work on (consistent) snapshots and are not impacted by concurrent update jobs performed by online incremental Extract-Transform-Load (ETL) flows that refresh fact/dimension tables. However, the snapshot made available in the DW is often stale, since at the moment when an analytical query is issued, the source updates (e.g. in a remote retail store) may not have been extracted and processed by the ETL process in time due to high input data volume or slow processing speed. This staleness may cause incorrect results for time-critical decision support queries. To address this problem, snapshots which are supposed to be accessed by analytical queries need to be first maintained by corresponding ETL flows to reflect source updates based on given freshness needs. Snapshot maintenance in this work means maintaining the distributed data partitions that are required by a query. Since most NoSQL databases are not ACID compliant and do not provide full-fledged distributed transaction support, snapshot may be inconsistently derived when its data partitions are updated by different ETL maintenance jobs.
机译:宽列NoSQL数据库是NoSQL(不仅是SQL)数据库的重要一类,它可以水平扩展并在稀疏表上具有较高的访问性能。随着大数据仓库(DW)的当前趋势,通过将多维模型映射到大列NoSQL模型或使用其他模型,在大列NoSQL数据库中的大量数据上运行现有的商业智能/数据仓库应用程序以降低延迟,具有吸引力。 SQL附加组件。例如,零售管理之类的应用程序可以在大型DW或云中存储的集成数据集上运行,以捕获当前的商品销售趋势。这些系统中的许多系统还采用快照隔离(SI)作为并发控制机制,以为读取大量工作负载实现高吞吐量。 SI在DW环境中运行良好,因为分析查询现在可以处理(一致的)快照,并且不受刷新事实/维度表的在线增量Extract-Transform-Load(ETL)流执行的并发更新作业的影响。但是,在DW中提供的快照通常是过时的,因为在发出分析查询的那一刻,由于以下原因,ETL流程可能未及时提取和处理源更新(例如,在远程零售商店中)。输入数据量大或处理速度慢。这种陈旧状态可能会导致时间紧迫的决策支持查询的结果不正确。为了解决这个问题,应该由相应的ETL流首先维护应该由分析查询访问的快照,以根据给定的新鲜度需求反映源更新。这项工作中的快照维护意味着维护查询所需的分布式数据分区。由于大多数NoSQL数据库都不符合ACID并且不提供完整的分布式事务支持,因此当快照的数据分区由不同的ETL维护作业更新时,可能会不一致地导出快照。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号