S2D: Shared Distributed Datasets, Storing Shared Data for Multiple and Massive Queries Optimization in a Distributed Data Warehouse

机译：S2D：共享的分布式数据集，在分布式数据仓库中存储共享数据以进行多个和大规模的查询优化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays, with the constantly increasing amount of data, we are facing a growing number of users, who are characterized by a frequent and a massively concurrent data access. The large number of users pose multiple query optimization problems. In a distributed data warehousing system such as Hadoop/Hive, queries are evaluated one at a time and processed with the MapReduce paradigm. The massive query execution usually overloads and slows down the entire distributed environment mainly due to multiple data scan tasks. In this paper we aim to optimize the multiple query execution performance on Hive. We propose Shared Distributed Datasets (S2D), a method that dynamically looks for and shares common data among queries. The evaluation shows that, compared to Hive, S2D consumes on average 20% less memory in the Map-scan task and it is 12% faster regarding the execution time of interactive and reporting queries from TPC-DS.

机译：如今，随着数据量的不断增加，我们面临着越来越多的用户，这些用户的特征是频繁且大量并发的数据访问。大量用户带来了多个查询优化问题。在诸如Hadoop / Hive之类的分布式数据仓库系统中，查询一次要评估一次，并使用MapReduce范例进行处理。大型查询执行通常会由于多个数据扫描任务而使整个分布式环境超载并减慢其速度。本文旨在优化Hive上的多查询执行性能。我们提出了共享分布式数据集（S2D），这是一种动态查找并在查询之间共享公共数据的方法。评估显示，与Hive相比，S2D在“地图扫描”任务中平均减少了20％的内存，而从TPC-DS进行交互式和报告查询的执行时间方面，它的速度要快12％。

著录项

来源
《International conference on big data analytics and knowledge discovery》|2017年|42-50|共9页
会议地点
作者
Rado Ratsimbazafy; Omar Boussaid; Fadila Bentayeb;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Big data warehousing; Query optimization; Distributed environment; Data sharing;

机译：大数据仓库;查询优化;分布式环境;资料共享;
入库时间 2022-08-26 13:48:58

相似文献

外文文献
中文文献
专利

1. Grid-based architecture for sharing distributed massive datasets [J] . Mohammed Bakri Bashir, Muhammad Shafie Abd Latiff, Adil Yousif International journal of communication networks and distributed systems . 2015,第2a3期

机译：基于网格的架构，用于共享分布式海量数据集
2. Data Share House: An Architecture to Handle Distributed Data Warehouse Management Issues [J] . Jaiteg Singh, Kawaljeet Singh Advances in applied computational mechanics . 2014,第2期

机译：数据共享之家：处理分布式数据仓库管理问题的体系结构
3. Data Share House: An Architecture to Handle Distributed Data Warehouse Management Issues [J] . Jaiteg Singh, Kawaljeet Singh Advances in computational sciences and technology . 2014,第2期

机译：数据共享之家：处理分布式数据仓库管理问题的体系结构
4. S2D: Shared Distributed Datasets, Storing Shared Data for Multiple and Massive Queries Optimization in a Distributed Data Warehouse [C] . Rado Ratsimbazafy, Omar Boussaid, Fadila Bentayeb International Conference on Big Data Analytics and Knowledge Discovery . 2017

机译：S2D：共享分布式数据集，在分布式数据仓库中存储多个和大规模查询优化的共享数据
5. Combinatorial Optimization on Massive Datasets: Streaming, Distributed, and Massively Parallel Computation [D] . Assadi, Sepehr. 2018

机译：大规模数据集的组合优化：流式，分布式和大规模并行计算
6. Shared data for intensity modulated radiation therapy (IMRT) optimization research: the CORT dataset [O] . David Craft, Mark Bangert, Troy Long, 2014

机译：用于调强放射疗法（IMRT）优化研究的共享数据：CORT数据集
7. ABACUS: A distributed middleware for privacy preserving data sharing across private data warehouses [O] . Fatih Emekci, Divyakant Agrawal, Amr El Abbadi 2005

机译：ABACUS：一种分布式中间件，用于在私有数据仓库之间保护隐私的数据共享
8. Research in Network Data Management Resource Sharing. Optimization Problems in Distributed Data Management. [R] . Belford, G. G. 1976

机译：网络数据管理资源共享研究。分布式数据管理中的优化问题。

S2D: Shared Distributed Datasets, Storing Shared Data for Multiple and Massive Queries Optimization in a Distributed Data Warehouse

摘要

著录项

相似文献

相关主题

期刊订阅