Distributed frameworks towards building an open data architecture.

机译：用于构建开放数据架构的分布式框架。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data is everywhere. The current Technological advancements in Digital, Social media and the ease at which the availability of different application services to interact with variety of systems are causing to generate tremendous volumes of data. Due to such varied services, Data format is now not restricted to only structure type like text but can generate unstructured content like social media data, videos and images etc. The generated Data is of no use unless been stored and analyzed to derive some Value. Traditional Database systems comes with limitations on the type of data format schema, access rates and storage sizes etc. Hadoop is an Apache open source distributed framework that support storing huge datasets of different formatted data reliably on its file system named Hadoop File System (HDFS) and to process the data stored on HDFS using MapReduce programming model.;This thesis study is about building a Data Architecture using Hadoop and its related open source distributed frameworks to support a Data flow pipeline on a low commodity hardware. The Data flow components are, sourcing data, storage management on HDFS and data access layer. This study also discuss about a use case to utilize the architecture components. Sqoop, a framework to ingest the structured data from database onto Hadoop and Flume is used to ingest the semi-structured Twitter streaming json data on to HDFS for analysis. The data sourced using Sqoop and Flume have been analyzed using Hive for SQL like analytics and at a higher level of data access layer, Hadoop has been compared with an in memory computing system using Spark. Significant differences in query execution performances have been analyzed when working with Hadoop and Spark frameworks. This integration helps for ingesting huge Volumes of streaming json Variety data to derive better Value based analytics using Hive and Spark.

机译：数据无处不在。数字，社交媒体当前的技术进步以及与各种系统进行交互的不同应用程序服务的易用性正导致产生大量数据。由于服务的多样性，数据格式现在不仅限于文本之类的结构类型，还可以生成非结构化的内容，如社交媒体数据，视频和图像等。所生成的数据没有任何用处，除非对其进行存储和分析以得出一定的价值。传统的数据库系统在数据格式模式，访问速率和存储大小等方面都受到限制。Hadoop是Apache开源分布式框架，支持在其名为Hadoop File System（HDFS）的文件系统上可靠地存储大量不同格式数据的数据集。本文主要研究使用Hadoop及其相关的开源分布式框架构建数据体系结构，以支持低价硬件上的数据流管道。数据流组件包括采购数据，HDFS和数据访问层上的存储管理。本研究还讨论了利用架构组件的用例。 Sqoop是一个将结构化数据从数据库吸收到Hadoop和Flume的框架，用于将半结构化Twitter流json数据吸收到HDFS上进行分析。已使用Hive对使用Sqoop和Flume的数据进行了分析，以进行类似SQL的分析，并且在更高级别的数据访问层，将Hadoop与使用Spark的内存计算系统进行了比较。在使用Hadoop和Spark框架时，已经分析了查询执行性能的显着差异。这种集成有助于摄取大量的流json多样性数据流，从而使用Hive和Spark更好地基于价值进行分析。

著录项

作者
Venumuddala, Ramu Reddy.;
展开▼
作者单位

University of North Texas.;

展开▼
授予单位 University of North Texas.;
学科 Computer science.
学位 M.S.
年度 2015
页码 58 p.
总页数 58
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-17 11:52:20

相似文献

外文文献
中文文献
专利

1. Building neurocognitive networks with a distributed functional architecture. [J] . Woodman M, Perdikis D, Pillai AS, Advances in Experimental Medicine and Biology . 2011,第Null期

机译：使用分布式功能架构构建神经认知网络。
2. Data mining cubes for buildings, a generic framework for multidimensional analytics of building performance data [J] . Leprince Julien, Miller Clayton, Zeiler Wim Energy and Buildings . 2021,第Octa期

机译：建筑物的数据挖掘多维数据集，用于构建性能数据的多维分析的通用框架
3. A framework for the utilization of Building Management System data in building information models for building design and operation [J] . Oti A. H., Kurul E., Cheung F., Automation in construction . 2016,第pta2期

机译：在建筑物信息模型中利用建筑物管理系统数据进行建筑物设计和运营的框架
4. A Framework for Building Distributed Data Flow Chains in Clusters [C] . Timm M. Steinbeck, Volker Lindenstruth, Dieter Rohrich, International Conference on Applied Parallel Computing . 2002

机译：构建分布式数据流链中的框架
5. Building Distributed Data Models in a Performance-Optimized, Goal-Oriented Optimization Framework for Cyber-Physical Systems [D] . Subramanian, Varun 2012

机译：在性能优化，面向目标的目标优化框架中建立分布式数据模型，用于网络物理系统
6. A Distributed Stream Processing Middleware Framework for Real-Time Analysis of Heterogeneous Data on Big Data Platform: Case of Environmental Monitoring [O] . Adeyinka Akanbi, Muthoni Masinde 2020

机译：大数据平台上异构数据实时分析的分布式流处理中间件框架：环境监测案例
7. Performance Analysis of Two Big Data Technologies on a Cloud Distributed Architecture. Results for Non-Aggregate Queries on Medium-Sized Data [O] . Fotache Marin, Hrubaru Ionuț 2016

机译：云分布式体系结构中两大数据技术的性能分析。中等数据的非聚合查询结果
8. Survey of Some Approaches to Distributed Data Base & Distributed File System Architecture. [R] . Mager, P. S., Goldberg, R. P. 1980

机译：分布式数据库与分布式文件系统体系结构的几种方法综述。

Distributed frameworks towards building an open data architecture.

摘要

著录项

相似文献

相关主题

期刊订阅