Distributed snapshot maintenance in wide-column NoSQL databases using partitioned incremental ETL pipelines

Qu Weiping; Dessloch Stefan

首页> 外文期刊>Information Systems >Distributed snapshot maintenance in wide-column NoSQL databases using partitioned incremental ETL pipelines

【24h】

Distributed snapshot maintenance in wide-column NoSQL databases using partitioned incremental ETL pipelines

机译：使用分区增量ETL管道在大列NoSQL数据库中进行分布式快照维护

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Wide-column NoSQL databases are an important class of NoSQL (Not only SQL) databases which scale horizontally and feature high access performance on sparse tables. With current trends towards big Data Warehouses (DWs), it is attractive to run existing business intelligence/data warehousing applications on higher volumes of data in wide-column NoSQL databases for low latency by mapping multidimensional models to wide-column NoSQL models or using additional SQL add-ons. For examples, applications like retail management can run over integrated data sets stored in big DWs or in the cloud to capture current item-selling trends. Many of these systems also employ Snapshot Isolation (SI) as a concurrency control mechanism to achieve high throughput for read-heavy workloads. SI works well in a DW environment, as analytical queries can now work on (consistent) snapshots and are not impacted by concurrent update jobs performed by online incremental Extract-Transform-Load (ETL) flows that refresh fact/dimension tables. However, the snapshot made available in the DW is often stale, since at the moment when an analytical query is issued, the source updates (e.g. in a remote retail store) may not have been extracted and processed by the ETL process in time due to high input data volume or slow processing speed. This staleness may cause incorrect results for time-critical decision support queries. To address this problem, snapshots which are supposed to be accessed by analytical queries need to be first maintained by corresponding ETL flows to reflect source updates based on given freshness needs. Snapshot maintenance in this work means maintaining the distributed data partitions that are required by a query. Since most NoSQL databases are not ACID compliant and do not provide full-fledged distributed transaction support, snapshot may be inconsistently derived when its data partitions are updated by different ETL maintenance jobs.

机译：宽列NoSQL数据库是NoSQL（不仅是SQL）数据库的重要一类，它可以水平扩展并在稀疏表上具有较高的访问性能。随着大数据仓库（DW）的当前趋势，通过将多维模型映射到大列NoSQL模型或使用其他模型，在大列NoSQL数据库中的大量数据上运行现有的商业智能/数据仓库应用程序以降低延迟，具有吸引力。 SQL附加组件。例如，零售管理之类的应用程序可以在大型DW或云中存储的集成数据集上运行，以捕获当前的商品销售趋势。这些系统中的许多系统还采用快照隔离（SI）作为并发控制机制，以为读取大量工作负载实现高吞吐量。 SI在DW环境中运行良好，因为分析查询现在可以处理（一致的）快照，并且不受刷新事实/维度表的在线增量Extract-Transform-Load（ETL）流执行的并发更新作业的影响。但是，在DW中提供的快照通常是过时的，因为在发出分析查询的那一刻，由于以下原因，ETL流程可能未及时提取和处理源更新（例如，在远程零售商店中）。输入数据量大或处理速度慢。这种陈旧状态可能会导致时间紧迫的决策支持查询的结果不正确。为了解决这个问题，应该由相应的ETL流首先维护应该由分析查询访问的快照，以根据给定的新鲜度需求反映源更新。这项工作中的快照维护意味着维护查询所需的分布式数据分区。由于大多数NoSQL数据库都不符合ACID并且不提供完整的分布式事务支持，因此当快照的数据分区由不同的ETL维护作业更新时，可能会不一致地导出快照。

著录项

来源
《Information Systems》 |2017年第10期|48-58|共11页
作者
Qu Weiping; Dessloch Stefan;
展开▼
作者单位

Univ Kaiserslautern, Heterogeneous Informat Syst Grp, Kaiserslautern, Germany;

Univ Kaiserslautern, Heterogeneous Informat Syst Grp, Kaiserslautern, Germany;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Distributed snapshot maintenance; Incremental ETL pipeline;

机译：分布式快照维护;增量ETL管道;

相似文献

外文文献
中文文献
专利

1. Extracting deltas from column oriented NoSQL databases for different incremental applications and diverse data targets [J] . Yong Hu, Stefan Dessloch Data & Knowledge Engineering . 2014,第sepa期

机译：从面向列的NoSQL数据库中提取增量以用于不同的增量应用程序和不同的数据目标
2. Data partition optimisation for column-family NoSQL databases [J] . Meng-Ju Hsieh, Li-Yung Ho, Jan-Jan Wu, International Journal of Big Data Intelligence . 2017,第4期

机译：列族NoSQL数据库的数据分区优化
3. A partitioning framework for Cassandra NoSQL database using Rendezvous hashing [J] . Elghamrawy Sally M., Hassanien Aboul Ella Journal of supercomputing . 2017,第10期

机译：使用Rendezvous哈希的Cassandra NoSQL数据库分区框架
4. On-Demand Snapshot Maintenance in Data Warehouses Using Incremental ETL Pipeline [C] . Weiping Qu, Stefan Dessloch International conference on big data analytics and knowledge discovery . 2017

机译：使用增量ETL管道进行数据仓库中的按需快照维护
5. Horizontal NoSQL database partitioning with data mining techniques [D] . Sauer, Brian 2014

机译：使用数据挖掘技术的水平NoSQL数据库分区
6. Automatic scaling and maintenance of a NoSQL database [O] . Nygaard Knut, Larsen Eivind Siqveland 2014

机译：自动扩展和维护NoSQL数据库

Distributed snapshot maintenance in wide-column NoSQL databases using partitioned incremental ETL pipelines

摘要

著录项

相似文献

相关主题

期刊订阅