首页> 外文学位 >Multiple query optimization support for data analysis applications.
【24h】

Multiple query optimization support for data analysis applications.

机译:对数据分析应用程序的多查询优化支持。

获取原文
获取原文并翻译 | 示例

摘要

The efficient storage, management, and manipulation of large datasets is important in many fields of science, engineering and business. Simulations and experimental measurements are the main sources of data in these fields and the amount of data available for analyzing is increasing at a very high pace due both to the increased capability to collect and store data, as well as to the capability for processing it. We broadly define these applications as data analysis applications. Their main characteristic is that they usually access a subset of all the data available—the hot spots—which are the data points of highest interest in generating data products.; In many cases, data analysis is employed in a collaborative environment, where multiple clients access the same datasets and perform similar processing on the data. For instance, in medical training, a large group of students may want to simultaneously explore a similar set of digitized microscopy slides, or visualize the same high resolution Magnetic Resonance Imaging (MRI) results. In this case, the data server needs to process multiple queries simultaneously to minimize latency to the clients.; Previously investigated multi-query optimization (MQO) techniques do not account for user-defined processing of data and user-defined aggregation methods which are typical of data analysis queries. Therefore, the problem we investigate in this dissertation is multiple query optimization for data analysis applications. It can be broadly defined as a set of techniques aimed at minimizing the total cost of processing a series of queries by creating an optimized access plan for the entire set of queries and for reusing previously computed aggregates.; The main goal of our work is to provide a generic optimization framework that can be used as a common platform to deploy data analysis applications that are able to efficiently handle multiple simultaneous queries and can leverage previously computed results to partially or fully compute new queries.; In this work, we show significant improvements in data management issues. These include the integration of an active semantic cache approach coupled with a data transformation model for reusing data and computation, a functional decomposition frame work for exposing reuse sites, query scheduling policies, and cache replacement policies. Finally, we show how all these techniques can be adequately implemented over new computation and execution model paradigms such as clusters of PCs and highly distributed, heterogeneous data grid environments.
机译:大型数据集的有效存储,管理和操作在科学,工程和商业的许多领域都非常重要。模拟和实验测量是这些领域中的主要数据源,并且由于收集和存储数据的能力以及处理数据的能力的提高,可用于分析的数据量正在以非常高的速度增长。我们将这些应用程序广泛定义为数据分析应用程序。它们的主要特征是,它们通常访问所有可用数据的子集-热点,这是生成数据产品时最感兴趣的数据点。在许多情况下,数据分析是在协作环境中进行的,在协作环境中,多个客户端访问相同的数据集并对数据执行类似的处理。例如,在医学培训中,一大批学生可能希望同时探索一组相似的数字化显微镜载玻片,或可视化相同的高分辨率磁共振成像(MRI)结果。在这种情况下,数据服务器需要同时处理多个查询,以最大程度地减少对客户端的延迟。先前研究的多查询优化(MQO)技术不能解决用户定义的数据处理和用户定义的聚合方法,这些方法是数据分析查询的典型特征。因此,本文研究的问题是数据分析应用的多查询优化。它可以广义地定义为一组技术,旨在通过为整个查询集创建优化的访问计划并重用先前计算的聚合来最小化处理一系列查询的总成本。我们工作的主要目标是提供一个通用的优化框架,该框架可以用作部署数据分析应用程序的通用平台,该应用程序能够有效地处理多个同时查询,并可以利用先前计算的结果来部分或完全计算新查询。在这项工作中,我们显示了数据管理问题的重大改进。这些措施包括集成主动语义缓存方法,以及用于重用数据和计算的数据转换模型,用于公开重用站点的功能分解框架,查询调度策略以及缓存替换策略。最后,我们展示了如何在新的计算和执行模型范式(例如PC集群和高度分布的异构数据网格环境)上充分实现所有这些技术。

著录项

  • 作者

    Andrade, Henrique C. M.;

  • 作者单位

    University of Maryland College Park.;

  • 授予单位 University of Maryland College Park.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2003
  • 页码 209 p.
  • 总页数 209
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号