Visualization and Adaptive Subsetting of Earth Science Data in HDFS: A Novel Data Analysis Strategy with Hadoop and Spark

机译：HDFS中地球科学数据的可视化和自适应子集：使用Hadoop和Spark的新型数据分析策略

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Data analytics becomes increasingly important in big data applications. Adaptively subsetting large amounts of data to extract the interesting events such as the centers of hurricane or thunderstorm, statistically analyzing and visualizing the subset data, is an effective way to analyze ever-growing data. This is particularly crucial for analyzing Earth Science data, such as extreme weather. The Hadoop ecosystem (i.e., HDFS, MapReduce, Hive) provides a cost-efficient big data management environment and is being explored for analyzing big Earth Science data. Our study investigates the potential of a MapReduce-like paradigm to perform statistical calculations, and utilizes the calculated results to subset as well as visualize data in a scalable and efficient way. RHadoop and SparkR are deployed to enable R to access and process data in parallel with Hadoop and Spark, respectively. The regular R libraries and tools are utilized to create and manipulate images. Statistical calculations, such as maximum and average variable values, are carried with R or SQL. We have developed a strategy to conduct query and visualization within one phase, and thus significantly improve the overall performance in a scalable way. The technical challenges and limitations of both Hadoop and Spark platforms for R are also discussed.

机译：数据分析在大数据应用中变得越来越重要。自适应地设置大量数据以提取有趣的事件（例如飓风或雷暴中心），对子集数据进行统计分析和可视化，是分析不断增长的数据的有效方法。这对于分析地球科学数据（例如极端天气）尤其重要。 Hadoop生态系统（即HDFS，MapReduce，Hive）提供了一种经济高效的大数据管理环境，并且正在探索用于分析大地球科学数据。我们的研究调查了类似MapReduce的范例执行统计计算的潜力，并利用计算结果以可扩展且有效的方式对子集进行可视化处理。部署RHadoop和SparkR可使R分别与Hadoop和Spark并行访问和处理数据。常规R库和工具用于创建和处理图像。统计计算，例如最大和平均变量值，由R或SQL进行。我们已经开发出一种策略，可以在一个阶段内进行查询和可视化，从而以可扩展的方式显着提高整体性能。还讨论了针对R的Hadoop和Spark平台的技术挑战和局限性。

著录项

来源
《2016 IEEE International Conferences on Big Data and Cloud Computing, Social Computing and Networking, Sustainable Computing and Communication》|2016年|89-96|共8页
会议地点 Atlanta(US)
作者
Xi Yang; Si Liu; Kun Feng; Shujia Zhou; Xian-He Sun;
展开▼
作者单位

Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA;

Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA;

Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA;

Northrop Grumman Inf. Technol., McLean, VA, USA;

Dept. of Comput. Sci., Illinois Inst. of Technol., Chicago, IL, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Data visualization; Sparks; Geoscience; Big data; Data models; Data analysis; Programming;

机译：数据可视化;火花;地球科学;大数据;数据模型;数据分析;编程;

相似文献

外文文献
中文文献
专利

1. A hierarchical indexing strategy for optimizing Apache Spark with HDFS to efficiently query big geospatial raster data [J] . Fei Hu, Chaowei Yang, Yongyao Jiang, International journal of digital Earth . 2020,第1a3期

机译：用HDFS优化Apache Spark的分层索引策略，以有效地查询大地理空间栅格数据
2. Web-Scale Multidimensional Visualization of Big Spatial Data to Support Earth Sciences—A Case Study with Visualizing Climate Simulation Data [J] . Sizhe Wang, Wenwen Li, Feng Wang Informatics . 2017,第3期

机译：支持地球科学的大空间数据的Web级多维可视化-以可视化气候模拟数据为例
3. SecDedoop: Secure Deduplication with Access Control of Big Data in the HDFS/Hadoop Environment [J] . Big Data . 2020,第2期

机译：SecDedoop：HDFS / Hadoop环境中具有大数据访问控制的安全重复数据删除
4. Visualization and Adaptive Subsetting of Earth Science Data in HDFS: A Novel Data Analysis Strategy with Hadoop and Spark [C] . Xi Yang, Si Liu, Kun Feng, IEEE International Conference on Big Data and Cloud Computing . 2016

机译：HDFS中地球科学数据的可视化和自适应子集：具有Hadoop和Spark的新型数据分析策略
5. Handling big data with a data-aware HDFS using evolutionary clustering technique. [D] . Hajeer, Mustafa Hussein. 2016

机译：使用进化聚类技术通过数据感知的HDFS处理大数据。
6. Biospark: scalable analysis of large numerical datasets from biologicalsimulations and experiments using Hadoop and Spark [O] . Max Klein, Rati Sharma, Chris H Bohrer, -1

机译：Biospark：来自生物学的大型数值数据集的可扩展分析使用Hadoop和Spark进行模拟和实验
7. Big Data Analysis Using Apache Spark MLlib and Hadoop HDFS with Scala and Java [O] . Hoger Khayrolla Omar, Alaa Khalil Jumaa 2019

机译：使用Apache Spark Mllib和Hadoop HDFS与Scala和Java的大数据分析

Visualization and Adaptive Subsetting of Earth Science Data in HDFS: A Novel Data Analysis Strategy with Hadoop and Spark

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅