首页> 外文学位 >High-performance processing of continuous uncertain data.
【24h】

High-performance processing of continuous uncertain data.

机译:连续不确定数据的高性能处理。

获取原文
获取原文并翻译 | 示例

摘要

Uncertain data has arisen in a growing number of applications such as sensor networks, RFID systems, weather radar networks, and digital sky surveys. The fact that the raw data in these applications is often incomplete, imprecise and even misleading has two implications: (i) the raw data is not suitable for direct querying, (ii) feeding the uncertain data into existing systems produces results of unknown quality.;This thesis presents a system for uncertain data processing that has two key functionalities, (i) capturing and transforming raw noisy data to rich queriable tuples that carry attributes needed for query processing with quantified uncertainty, and (ii) performing query processing on such tuples, which captures changes of uncertainty as data goes through various query operators. The proposed system considers data naturally captured by continuous distributions, which is prevalent in sensing and scientific applications.;The first part of the thesis addresses data capture and transformation by proposing a probabilistic modeling and inference approach. Since this task is application-specific and requires domain knowledge, this approach is demonstrated for RFID data from mobile readers. More specifically, the proposed solution involves an inference and cleaning substrate to transform raw RFID data streams to object location tuple streams where locations are inferred from raw noisy data and their uncertain values are captured by probability distributions.;The second, also the main part, of this thesis examines query processing for uncertain data modeled by continuous random variables. The proposed system includes new data models and algorithms for relational processing, with a focus on aggregation and conditioning operations. For operations of high complexity, optimizations including approximations with guaranteed error bounds are considered. Then complex queries involving a mix of operations are addressed by query planning, which given a query, finds an efficient plan that meets user-defined accuracy requirements.;Besides relational processing, this thesis also provides the support for user-defined functions (UDFs) on uncertain data, which aims to compute the output distribution given uncertain input and a black-box UDF. The proposed solution employs a learning-based approach using Gaussian processes to compute approximate output with error bounds, and a suite of optimizations for high performance in online settings such as data stream processing and interactive data analysis.;The techniques proposed in this thesis are thoroughly evaluated using both synthetic data with controlled properties and various real-world datasets from the domains of severe weather monitoring, object tracking using RFID readers, and computational astrophysics. The experimental results show that these techniques can yield high accuracy, meet stream speeds, and outperform existing techniques such as Monte Carlo sampling for many important workloads.
机译:越来越多的应用(例如传感器网络,RFID系统,天气雷达网络和数字天空勘测)中已经出现了不确定的数据。这些应用程序中的原始数据通常不完整,不精确甚至具有误导性,这一事实有两个含义:(i)原始数据不适合直接查询;(ii)将不确定的数据馈入现有系统会产生质量未知的结果。 ;本文提出了一种用于不确定数据处理的系统,该系统具有两个关键功能:(i)捕获原始噪声数据并将其转换为具有可量化不确定性的查询处理所需属性的丰富可查询元组,以及(ii)对此类元组执行查询处理,它捕获数据通过各种查询运算符时的不确定性变化。所提出的系统考虑了连续分布自然捕获的数据,这在传感和科学应用中很普遍。本文的第一部分通过提出概率建模和推理方法来解决数据的捕获和转换。由于此任务是特定于应用程序的,并且需要领域知识,因此针对来自移动阅读器的RFID数据演示了此方法。更具体地说,所提出的解决方案涉及推理和清洗基板,以将原始RFID数据流转换为对象位置元组流,在该对象流元组流中,从原始噪声数据推断出位置,并通过概率分布捕获其不确定值;第二,也是主要部分,本文的目的是检查查询处理中由连续随机变量建模的不确定数据。拟议的系统包括用于关系处理的新数据模型和算法,重点是聚合和条件运算。对于高复杂度的操作,要考虑包括保证误差范围的近似在内的优化。然后通过查询计划来解决涉及多种操作的复杂查询,查询计划给定一个查询,可以找到满足用户定义精度要求的有效计划。除了关系处理,本文还为用户定义函数(UDF)提供了支持。不确定数据,其目的是在给定不确定输入和黑盒UDF的情况下计算输出分布。提出的解决方案采用基于学习的方法,使用高斯过程来计算带有误差范围的近似输出,并针对数据流处理和交互式数据分析等在线环境中的高性能进行了一整套优化。使用具有受控属性的合成数据和来自恶劣天气监控,使用RFID阅读器的对象跟踪以及计算天体物理学领域的各种实际数据集进行评估。实验结果表明,这些技术可以产生高精度,满足流速度,并且在许多重要工作负载方面都优于现有技术(例如蒙特卡洛采样)。

著录项

  • 作者

    Tran, Thanh T. L.;

  • 作者单位

    University of Massachusetts Amherst.;

  • 授予单位 University of Massachusetts Amherst.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2013
  • 页码 205 p.
  • 总页数 205
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号