Segmented In-Advance Data Analytics for Fast Scientific Discovery

Jialin Liu; Yong Chen

首页> 外文期刊>Cloud Computing, IEEE Transactions on >Segmented In-Advance Data Analytics for Fast Scientific Discovery

【24h】

Segmented In-Advance Data Analytics for Fast Scientific Discovery

机译：分段的预先数据分析快速科学发现

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Scientific discovery usually involves data generation, data preprocessing, data storage and data analysis. As the data volume exceeds a few terabytes (TB) in a single simulation run, the data movement, which happens during each cycle of the scientific discovery, continues to be the bottleneck in most scientific big data applications. A lot of research works have been conducted on reducing the data movement. Among the existing efforts and based on our previous research, reusing the analysis results shows a significant potential in optimizing the data movement between analysis operations. In this work, we propose the Segmented In-Advance (SIA) data analytics approach for optimizing the data movement and we also provide a cloud-based elastic distributed in-memory database to manage the intermediate analysis results. The fundamental idea of this Segmented In-Advance approach is to analyze the history operations and to predict the future interesting analytics operations. The predicted analysis operation is in-advance performed on the finer segmented dataset and the segmented results are distributed in an in-memory key-value store for future reuse. The evaluation shows that the segmented in-advance data analytics approach achieves 1.2X-6.1X speedup. The evaluation also shows a good scalability of the in-memory distributed data store. The proposed Segmented In-Advance data analytics approach is a promising data movement reduction solution for scientific big data applications and fast scientific discovery.

机译：科学发现通常涉及数据生成，数据预处理，数据存储和数据分析。由于数据卷在单个仿真运行中超过了几个TBYTES（TB），因此在科学发现的每个周期内发生的数据移动仍然是大多数科学大数据应用中的瓶颈。在减少数据移动时已经进行了许多研究工作。在现有的努力和基于我们之前的研究中，重用分析结果在优化分析操作之间的数据流动方面具有重要潜力。在这项工作中，我们提出了分段in-appue （SIA）数据分析方法，用于优化数据移动，我们还提供基于云的弹性分布式内存数据库，以管理中间分析结果。这个的基本思想分段in-appue 方法是分析历史业务并预测未来的有趣分析操作。在更精细的分段数据集上执行预先预测的分析操作，并且分段结果分布在内存密钥值存储中以供将来重用。评估表明，分段的预先数据分析方法实现了1.2倍-6.1X的加速。评估还显示了内存分布式数据存储的良好可扩展性。提议分段in-appue 数据分析方法是对科学大数据应用和快速科学发现的有前途的数据运动减少解决方案。

著录项

来源
《Cloud Computing, IEEE Transactions on》 |2020年第2期|432-442|共11页
作者
Jialin Liu; Yong Chen;
展开▼
作者单位

Department of Computer Science Texas Tech University Lubbock Texas TX;

Department of Computer Science Texas Tech University Lubbock Texas TX;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Data analysis; Data models; Distributed databases; Cloud computing; Big data; Meteorology; Additives;

机译：数据分析;数据模型;分布式数据库;云计算;大数据;气象;添加剂;

相似文献

外文文献
中文文献
专利

1. Preface: Visualization and data analytics for scientific discovery [J] . Childs Hank, Cappello Franck Parallel Computing . 2016,第jula期

机译：前言：用于科学发现的可视化和数据分析
2. Data management, code deployment, and scientific visualization toe enhance scientific discovery in fusion research through advanced computing [J] . D. P. Schissel, A. Finkelstein, I. T. Foster Fusion Engineering and Design . 2002,第3期

机译：数据管理，代码部署和科学可视化通过先进的计算增强融合研究中的科学发现
3. Priority research directions for in situ data management: Enabling scientific discovery from diverse data sources [J] . Tom Peterka, Deborah Bard, Janine C Bennett, International Journal of High Performance Computing Applications . 2020,第4期

机译：优先级研究方向，用于原位数据管理：从不同的数据源启用科学发现
4. In-advance data analytics for reducing time to discovery [C] . Jialin Liu, Yin Lu, Yong Chen IEEE International Congress on Big Data . 2014

机译：事前数据分析可减少发现时间
5. Information synthesis: A mixed-initiative meta-analytic approach to facilitate knowledge discovery from scientific text. [D] . Blake, Catherine Lesley. 2003

机译：信息综合：一种混合启动的元分析方法，可促进从科学文本中发现知识。
6. Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery Research and Quality Assessment [O] . Teeradache Viangteeravat, Naga Satya V. Rao Nagisetty 2014

机译：提供原始数据的机会：使用Microsoft Live Labs Pivot的儿科研究数据库演示探索性的可视化分析以促进队列发现研究和质量评估
7. On Some Discoveries in the Field of Scientific Methods for Management within the Concept of Analytic Hierarchy Process [O] . Pawel Tadeusz Kazibudzki 2013

机译：层次分析概念下的科学管理方法领域的一些发现

Segmented In-Advance Data Analytics for Fast Scientific Discovery

摘要

著录项

相似文献

相关主题

期刊订阅