首页> 外文期刊>Cloud Computing, IEEE Transactions on >Segmented In-Advance Data Analytics for Fast Scientific Discovery
【24h】

Segmented In-Advance Data Analytics for Fast Scientific Discovery

机译:分段的预先数据分析快速科学发现

获取原文
获取原文并翻译 | 示例
           

摘要

Scientific discovery usually involves data generation, data preprocessing, data storage and data analysis. As the data volume exceeds a few terabytes (TB) in a single simulation run, the data movement, which happens during each cycle of the scientific discovery, continues to be the bottleneck in most scientific big data applications. A lot of research works have been conducted on reducing the data movement. Among the existing efforts and based on our previous research, reusing the analysis results shows a significant potential in optimizing the data movement between analysis operations. In this work, we propose the Segmented In-Advance (SIA) data analytics approach for optimizing the data movement and we also provide a cloud-based elastic distributed in-memory database to manage the intermediate analysis results. The fundamental idea of this Segmented In-Advance approach is to analyze the history operations and to predict the future interesting analytics operations. The predicted analysis operation is in-advance performed on the finer segmented dataset and the segmented results are distributed in an in-memory key-value store for future reuse. The evaluation shows that the segmented in-advance data analytics approach achieves 1.2X-6.1X speedup. The evaluation also shows a good scalability of the in-memory distributed data store. The proposed Segmented In-Advance data analytics approach is a promising data movement reduction solution for scientific big data applications and fast scientific discovery.
机译:科学发现通常涉及数据生成,数据预处理,数据存储和数据分析。由于数据卷在单个仿真运行中超过了几个TBYTES(TB),因此在科学发现的每个周期内发生的数据移动仍然是大多数科学大数据应用中的瓶颈。在减少数据移动时已经进行了许多研究工作。在现有的努力和基于我们之前的研究中,重用分析结果在优化分析操作之间的数据流动方面具有重要潜力。在这项工作中,我们提出了分段in-appue (SIA)数据分析方法,用于优化数据移动,我们还提供基于云的弹性分布式内存数据库,以管理中间分析结果。这个的基本思想分段in-appue 方法是分析历史业务并预测未来的有趣分析操作。在更精细的分段数据集上执行预先预测的分析操作,并且分段结果分布在内存密钥值存储中以供将来重用。评估表明,分段的预先数据分析方法实现了1.2倍-6.1X的加速。评估还显示了内存分布式数据存储的良好可扩展性。提议分段in-appue 数据分析方法是对科学大数据应用和快速科学发现的有前途的数据运动减少解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号