Oracle In-Database Hadoop:When MapReduce Meets RDBMS

机译：Oracle数据库内Hadoop：当MapReduce遇到RDBMS时

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Big.data is the tar sands of the data world: vast reserves of raw gritty data whose valuable information content can only be extracted at great cost. MapReduce is a popular parallel programming paradigm well suited to the programmatic extraction and analysis of information from these unstructured Big Data reserves. The Apache Hadoop implementation of MapReduce has become an important player in this market due to its ability to exploit large networks of inexpensive servers. The increasing importance of unstructured data has led to the interest in MapReduce and its Apache Hadoop implementation, which has led to the interest of data processing vendors in supporting this programming style. Oracle RDBMS has had support for the MapReduce paradigm for many years through the mechanism of user defined pipelined table functions and aggregation objects. However, such support has not been Hadoop source compatible. Native Hadoop programs needed to be rewritten before becoming usable in this framework. The ability to run Hadoop programs inside the Oracle database provides a versatile solution to database users, allowing them use programming skills they may already possess and to exploit the growing Hadoop eco-system. In this paper, we describe a prototype of Oracle In-Database Hadoop that supports the running of native Hadoop applications written in Java. '.Phis implementation executes Hadoop applications using the efficient parallel capabilities of the Oracle database and a subset of the Apache Hadoop infrastructure. This system's target audience includes both SQL and Hadoop users. We discuss the architecture and design, and in particular, demonstrate how MapReduce functionalities are seamlessly integrated within SQL ciueries. We also share our experience in building such a system within Oracle database and follow-on topics that we think are promising areas for exploration.

机译：Big.data是数据世界的焦油沙：大量原始的粗砂数据，其宝贵的信息内容只能以高昂的代价提取。 MapReduce是一种流行的并行编程范例，非常适合从这些非结构化大数据储备中以编程方式提取和分析信息。由于MapReduce的Apache Hadoop实现能够利用廉价服务器的大型网络，因此已成为该市场的重要参与者。非结构化数据的重要性日益增长，引起了人们对MapReduce及其Apache Hadoop实现的兴趣，这也引起了数据处理供应商对支持这种编程风格的兴趣。多年来，Oracle RDBMS通过用户定义的流水线表功能和聚合对象的机制支持MapReduce范例。但是，此类支持尚未与Hadoop源兼容。在本框架中可用之前，需要重写本机Hadoop程序。在Oracle数据库中运行Hadoop程序的能力为数据库用户提供了一种通用的解决方案，使他们能够使用他们可能已经拥有的编程技能并利用不断发展的Hadoop生态系统。在本文中，我们描述了Oracle In-Database Hadoop的原型，该原型支持运行用Java编写的本地Hadoop应用程序。 '.Phis实施使用Oracle数据库和Apache Hadoop基础结构的子集的高效并行功能执行Hadoop应用程序。该系统的目标受众包括SQL和Hadoop用户。我们讨论了体系结构和设计，尤其是演示了如何将MapReduce功能无缝集成到SQL语言中。我们还将分享在Oracle数据库中构建这样的系统的经验，以及我们认为很有希望探索的后续主题。

著录项

来源
《International conference on management of data》|2011年|779-789|共11页
会议地点
作者
Xueyuan Su; Garret Swart;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Parallel query execution; MapReduce; Hadoop;

机译：并行查询执行; MapReduce; Hadoop的;

相似文献

外文文献
中文文献
专利

1. Apache Hadoop YARN: moving beyond MapReduce and batch processing with Apache Hadoop 2 [J] . Aake Edlund Computing reviews . 2015,第8期

机译：Apache Hadoop YARN：超越MapReduce并使用Apache Hadoop 2进行批处理
2. SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters [J] . Rong Gu, Xiaoliang Yang, Jinshuang Yan, Journal of Parallel and Distributed Computing . 2014,第3期

机译：SHadoop：通过优化Hadoop集群中的作业执行机制来提高MapReduce性能
3. Enabling real-time city sensing with kernel stream oracles and MapReduce [J] . Christian Kaiser, Alexei Pozdnoukhov Pervasive and Mobile Computing . 2013,第5期

机译：使用内核流Oracle和MapReduce启用实时城市感测
4. Oracle In-Database Hadoop:When MapReduce Meets RDBMS [C] . Xueyuan Su, Garret Swart International conference on management of data . 2011

机译：Oracle In-Database Hadoop：当MapReduce遇到RDBMS时
5. ST-Hadoop: A MapReduce Framework for Big Spatio-Temporal Data Management [D] . Alarabi, Louai. 2019

机译：St-Hadoop：大型时空数据管理的MapReduce框架
6. FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy [O] . Umberto Ferraro Petrillo, Francesco Palini, Giuseppe Cattaneo, 2021

机译：Fasta / Q数据压缩机用于Mapreduce-Hadoop基因组学：空间和时间储蓄变得简单
7. IN-DATABASE RASTER ANALYTICS: MAP ALGEBRA AND PARALLEL PROCESSING IN ORACLE SPATIAL GEORASTER [O] . Q. J. Xie, Z. Z. Zhang, S. Ravada 2012

机译：数据库中的光栅分析：ORaCLE空间格罗斯特的地图代数和并行处理

Oracle In-Database Hadoop:When MapReduce Meets RDBMS

摘要

著录项

相似文献

相关主题

期刊订阅