首页> 外文会议>IEEE International Symposium on Information Theory >What is the Value of Data? on Mathematical Methods for Data Quality Estimation
【24h】

What is the Value of Data? on Mathematical Methods for Data Quality Estimation

机译:数据的价值是什么?数据质量估计的数学方法研究

获取原文

摘要

Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset’s quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.
机译:数据是信息时代最重要的资产之一,其社会影响是无可争议的。但是,缺乏评估数据质量的严格方法。在本文中,我们提出了给定数据集质量的正式定义。我们通过称为预期直径的数量来评估数据集的质量,该数量用于衡量解释该数据的两个随机选择的假设之间的预期差异,并且最近在主动学习中得到了应用。我们专注于布尔超平面,并利用傅立叶解析,代数和概率方法的集合为计算期望直径提供了理论保证和实用解决方案。我们还研究了代数结构化数据集上预期直径的行为,进行了验证这种质量观念的实验,并证明了我们技术的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号