【24h】

Multi-dimensional Analysis of Industrial Big Data Based JSON Document

机译:基于工业大数据的JSON文档的多维分析

获取原文

摘要

Industrial big data analysis and mining are extremely complicated since it has complex correlations and heterogeneous structure from multiple data source. The growing industrial big data makes data analysis and mining extremely complicated. However, the traditional analysis approach based on relational databases or data warehouses are not flexible enough to deal with multi-source heterogeneous data and are less efficient to do search and analysis operation. Based on Spark and Elasticsearch, this paper presents a multi-dimensional analysis method and system for industrial big data. An OLAP model architecture based on JSON document structure is proposed, which can use Key-Value structure to flexibly define diverse industrial data, and the multi-dimensional structure model is easy to query and analyze. The table structure in the dimension information is converted into a JSON-based document structure, and the dimension information contained in the fact table is stored by the nested document. Elasticsearch is used to store the document structure tree and build an inverted index, which can improve the efficiency of the data analysis query. The query and analysis operations are transformed into the traversal and query operations in the document content. The time efficiency of the multi-dimensional analysis system based on Elasticsearch is much better than the analysis efficiency based on Hive.
机译:工业大数据分析和采矿极为复杂,因为它具有复杂的相关性和来自多个数据源的异构结构。越来越多的工业大数据使数据分析和采矿极其复杂。然而,基于关系数据库或数据仓库的传统分析方法不足以处理多源异构数据,并且可以减少搜索和分析操作。基于火花和弹性研究,本文介绍了工业大数据的多维分析方法和系统。提出了一种基于JSON文档结构的OLAP模型架构,可以使用键值结构灵活地定义各种工业数据,并且多维结构模型易于查询和分析。维度信息中的表结构被转换为基于JSON的文档结构,并且事实表中包含的维信息由嵌套文档存储。 Elasticsearch用于存储文档结构树并构建反转索引,可以提高数据分析查询的效率。查询和分析操作将转换为文档内容中的遍历和查询操作。基于Elasticsearch的多维分析系统的时间效率远远优于基于蜂巢的分析效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号