首页> 外文会议>IEEE International Congress on Big Data >Push-based system for molecular simulation data analysis
【24h】

Push-based system for molecular simulation data analysis

机译:基于推式的分子模拟数据分析系统

获取原文

摘要

Many scientific fields generate, and require manipulation of big data. Known scientific data analysis systems, as well as traditional DBMSs, follow a pull-based architectural design, where the executed queries mandate the data needed. This design, while suitable for traditional transaction-based workloads where number of queries retrieve small parts of data located at various places of the database, is ill-fitted for applications involving complex analysis on most of the data. Such design involves redundant and random I/O, considerably affecting the data throughput in the system. In this paper, we design and implement a push-based type system that allows high-throughput data analysis in the process of scientific discovery. Our design improves throughput in two ways: i) it uses a sequential scan-based I/O framework that loads the data into the main memory, and then ii) the system pushes the loaded data to a number of pre-programmed queries. By this way the system lowers the unnecessary I/O overhead imposed by the randomized, index-based scan and that of a multiple data reads if each query were to be fed separately. Considering the amount of data and the number of executed queries, we believe our system provides substantial improvement over the current data analyzing systems. The efficiency of the proposed system is backed by the results of extensive experiments using real MS data. The running times of our system are compared to those of the GROMACS system. The comparison shows the advantage and the potential of using such push-based system for data system analysis.
机译:许多科学领域都会产生,并且需要对大数据进行处理。已知的科学数据分析系统以及传统的DBMS都遵循基于拉式的体系结构设计,其中已执行的查询要求提供所需的数据。这种设计虽然适用于传统的基于事务的工作负载,在这些工作负载中,查询数量很多,但需要检索位于数据库各个位置的一小部分数据,但它不适用于涉及对大多数数据进行复杂分析的应用程序。这种设计涉及冗余和随机I / O,从而大大影响系统中的数据吞吐量。在本文中,我们设计并实现了一个基于推送的类型系统,该系统允许在科学发现过程中进行高通量数据分析。我们的设计通过两种方式提高了吞吐量:i)它使用基于顺序扫描的I / O框架将数据加载到主存储器中,然后ii)系统将加载的数据推送到许多预编程的查询中。通过这种方式,如果每个查询要分别提供,则系统可以降低由基于索引的随机扫描和多个数据读取所带来的不必要的I / O开销。考虑到数据量和已执行查询的数量,我们认为我们的系统相对于当前的数据分析系统提供了实质性的改进。使用实际MS数据进行的大量实验结果支持了所提出系统的效率。我们的系统的运行时间与GROMACS系统的运行时间进行了比较。比较显示了使用这种基于推送的系统进行数据系统分析的优势和潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号