【24h】

On Construction of a Power Data Lake Platform Using Spark

机译:用火花建设电力数据湖泊平台

获取原文

摘要

Currently, the traditional architecture of data storage and analysis has become not suitable enough. With rapid flow of information, there is no doubt that big data technology brings significant benefits such as efficiency and productivity. However, a successful approach to big data migration requires efficient architecture. In this paper, we proposed an architecture to import existing power data storage system of our campus into big data platform with Data Lake. We use Apache sqoop to transfer historical data to Apache Hive for data storage. Kafka is used for making sure the integrity of streaming data and as the input source for Spark streaming that writing data to HBase. To integrate the data we use the concept of data lake which based on Hive and HBase. Impala and Apache Phoenix are individually used as search engines for Hive and HBase. Apache Spark can quickly analyze and compute the data from Data Lake, and we choose Apache Superset as the solution for visualization.
机译:目前,传统的数据存储和分析架构变得不够合适。凭借迅速的信息流,毫无疑问,大数据技术带来了显着的效益,例如效率和生产力。但是,成功的大数据迁移方法需要有效的架构。在本文中,我们提出了一种架构,可以将我们校园的现有电源数据存储系统导入与数据湖的大数据平台。我们使用Apache SQOP将历史数据传输到Apache Hive以进行数据存储。 Kafka用于确保流数据的完整性,以及作为将数据写入HBase的火花流的输入源。要整合数据,我们使用基于Hive和HBase的数据湖的概念。 Impala和Apache Phoenix被单独用作Hive和HBase的搜索引擎。 Apache Spark可以快速分析和计算来自Data Lake的数据,并选择Apache Superset作为可视化的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号