Building a Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and Enterprise Warehouse

Tian Yuanyuan; Ozcan Fatma; Zou Tao; Goncalves Romulo; Pirahesh Hamid

首页> 外文期刊>ACM transactions on database systems >Building a Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and Enterprise Warehouse

【24h】

Building a Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and Enterprise Warehouse

机译：构建混合仓库：HDFS中存储的数据与企业仓库之间的有效联接

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The Hadoop Distributed File System (HDFS) has become an important data repository in the enterprise as the center for all business analytics, from SQL queries and machine learning to reporting. At the same time, enterprise data warehouses (EDWs) continue to support critical business analytics. This has created the need for a new generation of a special federation between Hadoop-like big data platforms and EDWs, which we call the hybrid warehouse. There are many applications that require correlating data stored in HDFS with EDW data, such as the analysis that associates click logs stored in HDFS with the sales data stored in the database. All existing solutions reach out to HDFS and read the data into the EDW to perform the joins, assuming that the Hadoop side does not have efficient SQL support.

机译：Hadoop分布式文件系统（HDFS）已成为企业中重要的数据存储库，作为从SQL查询和机器学习到报告的所有业务分析的中心。同时，企业数据仓库（EDW）继续支持关键业务分析。这就需要在类似Hadoop的大数据平台与EDW（我们称为混合仓库）之间建立新一代的特殊联盟。有许多应用程序需要将HDFS中存储的数据与EDW数据相关联，例如将HDFS中存储的点击日志与数据库中存储的销售数据相关联的分析。假设Hadoop端没有有效的SQL支持，所有现有的解决方案都可以连接到HDFS并将数据读入EDW以执行联接。

著录项

来源
《ACM transactions on database systems》 |2016年第4期|21.1-21.38|共38页
作者
Tian Yuanyuan; Ozcan Fatma; Zou Tao; Goncalves Romulo; Pirahesh Hamid;
展开▼
作者单位

IBM Res Almaden, 650 Harry Rd, San Jose, CA 95120 USA;

IBM Res Almaden, 650 Harry Rd, San Jose, CA 95120 USA;

Google, 1600 Amphitheatre Pkwy, Mountain View, CA 94043 USA;

Netherlands eSci Ctr, Sci Pk 140 Matrix 1, NL-1098 XG Amsterdam, Netherlands;

IBM Res Almaden, 650 Harry Rd, San Jose, CA 95120 USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Distributed join; join on Hadoop; Bloom filter; SQL-on-Hadoop; hybrid warehouse; federation; query push-down; cost model;

机译：分布式连接;在Hadoop上连接;Bloom过滤器;SQL-on-Hadoop;混合仓库;联合;查询下推;成本模型;

相似文献

外文文献
中文文献
专利

1. Efficient of bitmap join indexes for optimising star join queries in relational data warehouses [J] . Mohammed Yahyaoui, Souad Amjad, Lamia Benameur, International journal of computational intelligence studies . 2020,第3期

机译：用于优化关系数据仓库中的STAR加入查询的位图连接索引的效率
2. Parallel Star Join+DataIndexes: efficient query processing in data warehouses and OLAP [J] . Datta A., VanderMeer D., Ramamritham K. IEEE Transactions on Knowledge and Data Engineering . 2002,第6期

机译：并行Star Join + DataIndexes：数据仓库和OLAP中的高效查询处理
3. Building Data Warehouses Using The Enterprise Modeling Framework [J] . Chan Joseph O Journal of International Technology and Information Management . 2004,第1期

机译：使用企业建模框架构建数据仓库
4. EIHJoin: An hash join with building index in bucket in column store data warehouse [C] . Dateng Hao, Li Sun IET International Conference on Smart and Sustainable City 2013 . 2013

机译：EIHJoin：列存储数据仓库中存储桶中具有建筑索引的散列连接
5. Designing a clinical data warehouse to store relevant data for assessment of attention deficit/hyperactivity disorder (ADHD) in children [D] . Ortiz Fournier, Lillian Vanessa 2011

机译：设计临床数据仓库以存储相关数据以评估儿童的注意缺陷/多动症（ADHD）
6. The Enterprise Data Trust at Mayo Clinic: a semantically integrated warehouse of biomedical data [O] . Christopher G Chute, Scott A Beck, Thomas B Fisk, 2010

机译：Mayo诊所的企业数据信任：生物医学数据的语义集成仓库
7. Parallel Star Join + Data Indexes: efficient query processing in data warehouses and OLAP [O] . DATTA ANINDYA, VANDERMEER DEBRA, RAMAMRITHAM KRITHI 2002

机译：并行星形联接+数据索引：数据仓库和OLAP中的高效查询处理

Building a Hybrid Warehouse: Efficient Joins between Data Stored in HDFS and Enterprise Warehouse

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅