On Construction of a Power Data Lake Platform Using Spark

机译：用火花建设电力数据湖泊平台

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Currently, the traditional architecture of data storage and analysis has become not suitable enough. With rapid flow of information, there is no doubt that big data technology brings significant benefits such as efficiency and productivity. However, a successful approach to big data migration requires efficient architecture. In this paper, we proposed an architecture to import existing power data storage system of our campus into big data platform with Data Lake. We use Apache sqoop to transfer historical data to Apache Hive for data storage. Kafka is used for making sure the integrity of streaming data and as the input source for Spark streaming that writing data to HBase. To integrate the data we use the concept of data lake which based on Hive and HBase. Impala and Apache Phoenix are individually used as search engines for Hive and HBase. Apache Spark can quickly analyze and compute the data from Data Lake, and we choose Apache Superset as the solution for visualization.

机译：目前，传统的数据存储和分析架构变得不够合适。凭借迅速的信息流，毫无疑问，大数据技术带来了显着的效益，例如效率和生产力。但是，成功的大数据迁移方法需要有效的架构。在本文中，我们提出了一种架构，可以将我们校园的现有电源数据存储系统导入与数据湖的大数据平台。我们使用Apache SQOP将历史数据传输到Apache Hive以进行数据存储。 Kafka用于确保流数据的完整性，以及作为将数据写入HBase的火花流的输入源。要整合数据，我们使用基于Hive和HBase的数据湖的概念。 Impala和Apache Phoenix被单独用作Hive和HBase的搜索引擎。 Apache Spark可以快速分析和计算来自Data Lake的数据，并选择Apache Superset作为可视化的解决方案。

著录项

来源
《International conference on frontier computing: theory, technologies and applications》|2019年|xxv 1012 p.|共10页
会议地点
作者
Tzu-Yang Chen; Chao-Tung Yang; Endah Kristiani; Chun-Tse Cheng;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Big data; Data lake; Apache spark; Kafka; Visualization;

机译：大数据;数据湖;Apache Spark;Kafka;可视化;

相似文献

外文文献
中文文献
专利

1. The implementation of data storage and analytics platform for big data lake of electricity usage with spark [J] . Yang Chao-Tung, Chen Tzu-Yang, Kristiani Endah, Journal of supercomputing . 2021,第6期

机译：用火花实现大数据湖数据存储与分析平台
2. On construction of a big data warehouse accessing platform for campus power usages [J] . Chih-Hung Chang, Fuu-ChengJiang, Chao-Tung Yang, Journal of Parallel and Distributed Computing . 2019,第Nova期

机译：构建校园用电大数据仓库访问平台
3. Social media data processing infrastructure by using Apache Spark big data platform: Twitter data analysis [J] . Dominik Strzalka Computing reviews . 2021,第6期

机译：通过使用Apache Spark大数据平台的社交媒体数据处理基础架构：Twitter数据分析
4. On Construction of a Power Data Lake Platform Using Spark [C] . Tzu-Yang Chen, Chao-Tung Yang, Endah Kristiani, International conference on frontier computing: theory, technologies and applications . 2019

机译：用火花建设电力数据湖泊平台
5. Effective Field Theory Search Results from the LUX Run 4 Data Set, and Construction of the LZ System Test Platforms [D] . Alsum, Shaun Kenneth. 2020

机译：有效的现场理论搜索结果来自Lux运行4数据集，以及LZ系统测试平台的构建
6. Influence of the combination of big data technology on the Spark platform with deep learning on elevator safety monitoring efficiency [O] . Jie Yu, Bo Hu, Zhihan Lv, 2020

机译：大数据技术组合对电梯安全监测效率深受深度学习的影响
7. Construction and Application of Ship Data Mining Platform Based on Spark [O] . Lei Cao, Jingfeng Hu, Ran Li 2019

机译：基于火花的船舶数据挖掘平台的构建与应用
8. Metal and Nonmetal Mine Safety and Health Report of Investigation: Surface Nonmetal Mine (Construction Sand and Gravel) Fatal Powered Haulage Accident, March 8, 2016, Staker and Parsons Companies, Beck Street South, Salt Lake City, Salt Lake County, Utah. MSHA I.D. 42-00410. [R] . 2016

机译：金属和非金属矿山安全和健康调查报告：表面非金属矿（建筑砂和砾石）致命动力拖运事故，2016年3月8日，staker和parsons公司，Beck street south，盐湖城，盐湖城，犹他州。 msHa I.D. 42-00410。

On Construction of a Power Data Lake Platform Using Spark

摘要

著录项

相似文献

相关主题

期刊订阅