Efficient time compression earthquake database using hadoop Hive ORC format

机译：使用hadoop Hive ORC格式的高效时间压缩地震数据库

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Today is an age of Big data. Big data is the normally unstructured data. Apache Hive is largely used for analysis in process of huge data. Because it is like SQL so easy to get analytical report. The main problem is that unstructured data loading and storage as well as Fast and timely analysis of large amount of data. There are data Compression columnar format like ORC(Optimized Row And Columnar) and Parquet columnar format. In this paper we used USGS (United States Geological Survey) Earthquake dataset. USGS provides the multi-Dimension dataset of earthquake of every day, week and month. We applied hadoop Hive's ORC format On monthly USGS earthquake dataset. ORC format Stored dataset efficiently without lose so that the most important data without losing stored on HDFS. We compare result of ORC Sorted and Unsorted dataset on the basses of time required to load the dataset on HDFS.

机译：今天是大数据时代。大数据是通常非结构化的数据。 Apache Hive主要用于海量数据的分析。因为它像SQL，所以很容易获得分析报告。主要问题是非结构化数据的加载和存储以及对大量数据的快速及时分析。有数据压缩列格式，例如ORC（优化行和列）和Parquet列格式。在本文中，我们使用了USGS（美国地质调查局）地震数据集。 USGS提供了每天，每周和每月的多维地震数据集。我们在每月的USGS地震数据集上应用了hadoop Hive的ORC格式。 ORC格式有效地存储数据集而不会丢失，从而将最重要的数据存储在HDFS上而不会丢失。我们将ORC Sorted和Unsorted数据集的结果与将数据集加载到HDFS所需的时间进行比较。

著录项

来源
《International Conference on Intelligent Computing and Control Systems》|2017年|1361-1364|共4页
会议地点
作者
Pramod Ravindra Patil; Vivek Kshirsagar;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Earthquakes; Big Data; Loading; Indexes; Computer architecture; Control systems; Ecosystems;

机译：地震;大数据;加载;索引;计算机体系结构;控制系统;生态系统;

相似文献

外文文献
中文文献
专利

1. The impact of columnar file formats on SQL-on-hadoop engine performance: A study on ORC and Parquet [J] . Concurrency, practice and experience . 2020,第5期

机译：列式文件格式对SQL-on-hadoop引擎性能的影响：关于ORC和Parquet的研究
2. Hadoopで並列分散処理を体験！: Hadoop 2.6 + Tez + Hiveの実行例 [J] . 鰺坂　明, 濱野　賢一朗パソコン；ワークステーションによるソフトウェア開発のテクニカル情報誌 . 2015,第3期

机译：体验Hadoop的并行分布式处理！：Hadoop 2.6 + Tez + Hive示例
3. Relational Query Optimization Technique using Space Efficient File Formats of Hadoop for the Big Data Warehouse System [J] . Sudhanshu Shekhar Bisoyi, Pragnyaban Mishra, S. N. Mishra Indian Journal of Science and Technology . 2017,第19期

机译：大数据仓库系统中使用Hadoop的节省空间文件格式的关系查询优化技术
4. Efficient Time Compression Earthquake Database Using Hadoop Hive ORC Format [C] . Pramod Ravindra Patil, Vivek Kshirsagar International Conference on Intelligent Computing and Control Systems . 2017

机译：使用Hadoop Hive ORC格式有效的时间压缩地震数据库
5. Designing Time Efficient Real Time Hardware in the Loop Simulation Using Input Profile Temporal Compression [D] . Chatterjee, Sourindu. 2017

机译：使用输入配置文件时间压缩在循环仿真中设计省时的实时硬件
6. Sequence Compression Benchmark (SCB) database—A comprehensive evaluation of reference-free compressors for FASTA-formatted sequences [O] . Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, 2020

机译：序列压缩基准（SCB）数据库-全面评估FASTA格式序列的无参考压缩器
7. The impact of columnar file formats on SQL‐on‐hadoop engine performance: A study on ORC and Parquet [O] . Todor Ivanov, Matteo Pergolesi 2019

机译：柱状文件格式对SQL-On-Hadoop引擎性能的影响：兽人和木质地板的研究

Efficient time compression earthquake database using hadoop Hive ORC format

摘要

著录项

相似文献

相关主题

期刊订阅