首页> 外文会议>International Conference on Cloud and Autonomic Computing >A Framework for Managing Continuous Query Evaluations over Voluminous, Multidimensional Datasets
【24h】

A Framework for Managing Continuous Query Evaluations over Voluminous, Multidimensional Datasets

机译:用于管理大量多维数据集上的连续查询评估的框架

获取原文

摘要

Efficient access to voluminous multidimensional datasets is essential for several scientific applications, including real-time analysis and visualization. Fast evolving datasets present unique challenges during retrievals. Keeping data up-to-date can be expensive and may involve the following: repeated data queries, excessive data movements, and redundant data preprocessing. This paper focuses on the issue of efficient manipulation of query results in cases where the dataset is continuously evolving. Our approach provides an automated and scalable tracking and caching mechanism to evaluate continuous queries over data stored in a distributed storage system. Among the storage nodes, one or more nodes are selected using an election algorithm based on CPU and memory utilization. These selected nodes ensure that the query output contains the most recent data arrivals and cache the metadata of the query output. This approach is evaluated in the context of Galileo, our distributed data storage framework. Galileo is designed for managing multidimensional time-series datasets generated in geospatial observational settings, e.g. Data generated by remote sensing equipment and sensor networks. We describe our approach of using the metadata graph to push data preprocessing jobs onto the storage system during the continuous query processing and selectively download subsets of the query output. Our performance benchmarks demonstrate the efficacy of our approach.
机译:有效地访问大量多维数据集对于包括实时分析和可视化在内的多种科学应用来说都是必不可少的。快速发展的数据集在检索过程中提出了独特的挑战。保持数据最新可能很昂贵,并且可能涉及以下方面:重复的数据查询,过多的数据移动以及冗余的数据预处理。本文着重于在数据集不断发展的情况下对查询结果进行有效处理的问题。我们的方法提供了一种自动且可扩展的跟踪和缓存机制,以评估对分布式存储系统中存储的数据的连续查询。在存储节点中,使用一个基于CPU和内存利用率的选举算法选择一个或多个节点。这些选定的节点可确保查询输出包含最新的数据到达并缓存查询输出的元数据。这种方法是在我们的分布式数据存储框架Galileo的上下文中进行评估的。伽利略(Galileo)设计用于管理在地理空间观测环境(例如地理空间)中生成的多维时间序列数据集。遥感设备和传感器网络生成的数据。我们描述了在连续查询处理期间使用元数据图将数据预处理作业推送到存储系统并有选择地下载查询输出的子集的方法。我们的性能基准证明了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号