首页> 外文学位 >Provenance and uncertainty.
【24h】

Provenance and uncertainty.

机译:种源和不确定性。

获取原文
获取原文并翻译 | 示例

摘要

Data provenance, a record of the origin and transformation of data, explains how output data is derived from input data. This dissertation focuses on exploring the connection between provenance and uncertainty in two main directions: (1) how a succinct representation of provenance can help infer uncertainty in the input or the output, and (2) how introducing uncertainty can facilitate publishing provenance information while hiding associated private information.;A significant fraction of the data found in practice is imprecise, unreliable, and incomplete, and therefore uncertain. The level of uncertainty in the data must be measured and recorded in order to estimate the confidence in the results and find potential sources of error. In probabilistic databases, uncertainty in the input is recorded as a probability distribution, and the goal is to efficiently compute the induced probability distribution on the outputs. In general, this problem is computationally hard, and we seek to expand the class of inputs for which efficient evaluation is possible by exploiting provenance structure.;In some scenarios, the output data is directly examined for errors and is labeled accordingly. We need to trace back the errors in the output to the input so that the input can be refined for future processing. Because of incomplete labeling of the output and complexity of the processes generating it, the sources of error may be uncertain. We formalize the problem of source refinement, and propose models and solutions using provenance that can handle incomplete labeling. We also evaluate our solutions empirically for an application of source refinement in information extraction .;Data provenance is extensively used to help understand and debug scientific experiments that often involve proprietary and sensitive information. In this dissertation, we consider privacy of proprietary and commercial modules when they belong to a workflow and interact with other modules. We propose a model for module privacy that makes the exact functionality of the modules uncertain by selectively hiding provenance information. We also study the optimization problem of minimizing the information hidden while guaranteeing a desired level of privacy.
机译:数据来源是数据来源和转换的记录,它解释了如何从输入数据中导出输出数据。本文着眼于在两个主要方向上探索物源与不确定性之间的联系:(1)物源的简洁表示如何有助于推断输入或输出中的不确定性;(2)引入不确定性如何在隐藏的同时促进发布物源信息相关的私人信息。;实践中发现的数据中有很大一部分是不准确,不可靠和不完整的,因此不确定。必须测量和记录数据中的不确定性水平,以估计结果的可信度并找到潜在的误差源。在概率数据库中,将输入中的不确定性记录为概率分布,目标是有效地计算输出上的诱导概率分布。通常,此问题在计算上比较困难,我们试图通过利用出处结构来扩展可能进行有效评估的输入类别。在某些情况下,将直接检查输出数据是否存在错误并进行相应标记。我们需要将输出中的错误追溯到输入,以便可以改进输入以供将来处理。由于输出的标签不完整以及生成它的过程的复杂性,错误的来源可能不确定。我们将源头优化问题形式化,并使用可处理不完整标签的出处提出模型和解决方案。我们还根据经验评估我们的解决方案,以在信息提取中应用源优化。;数据来源广泛用于帮助理解和调试通常涉及专有和敏感信息的科学实验。在本文中,当专有和商业模块属于工作流并与其他模块交互时,我们考虑它们的隐私。我们提出了一种模块保密性模型,该模型通过有选择地隐藏出处信息来使模块的确切功能不确定。我们还研究了在确保所需的隐私级别的同时将隐藏信息最小化的优化问题。

著录项

  • 作者

    Roy, Sudeepa.;

  • 作者单位

    University of Pennsylvania.;

  • 授予单位 University of Pennsylvania.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 303 p.
  • 总页数 303
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号