首页> 外文会议>Astronomical Data Analysis Software and Systems Conference >Pre-feasibility Study of Astronomical Data Archive Systems Powered by Public Cloud Computing and Hadoop Hive
【24h】

Pre-feasibility Study of Astronomical Data Archive Systems Powered by Public Cloud Computing and Hadoop Hive

机译:公共云计算和Hadoop Hive供电的天文数据归档系统的可行性研究

获取原文

摘要

The size of astronomical observational data is increasing yearly. For example, while Atacama Large Millimeter/submillimeter Array is expected to generate 200 TB raw data every year, Large Synoptic Survey Telescope is estimated to produce 15 TB raw data every night. Since the increasing rate of computing is much lower than that of astronomical data, to provide high performance computing (HPC) resources together with scientific data will be common in the next decade. However, the installation and maintenance costs of a HPC system can be burdensome for the provider. I note public cloud computing for an alternative way to get sufficient computing resources inexpensively. I build Hadoop and Hive clusters by utilizing a virtual private server (VPS) service and Amazon Elastic MapReduce (EMR), and measure their performances. The VPS cluster behaves differently day by day, while the EMR clusters are relatively stable. Since partitioning is essential for Hive, several partitioning algorithms are evaluated. In this paper, I report the results of the benchmarks and the performance optimizations in cloud computing environment.
机译:天文观测数据的大小每年都在增加。例如,虽然Atacama大型毫米/倍细阵列每年有200 TB原料数据,但估计大型舞台调查望远镜每晚都会产生15个TB原料数据。由于计算的增加率远低于天文数据的速度,以便提供高性能计算(HPC)资源以及科学数据在未来十年内将是常见的。但是,HPC系统的安装和维护成本可能对提供商负担。我注意到公共云计算以廉价的方式获得足够的计算资源。我通过利用虚拟专用服务器(VPS)服务和Amazon Elastic MapReduce(EMR)来构建Hadoop和Hive集群,并测量其性能。 VPS集群在日复一日的情况下表现不同,而EMR集群相对稳定。由于分区对于Hive是必不可少的,因此评估几种分区算法。在本文中,我报告了基准的结果和云计算环境中的性能优化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号