首页> 外文会议>SIGMOD/PODS >Benchmarking Declarative Approximate Selection Predicates
【24h】

Benchmarking Declarative Approximate Selection Predicates

机译:基准陈述近似选择谓词

获取原文
获取外文期刊封面目录资料

摘要

Declarative data quality has been an active research topic. The fundamental principle behind a declarative approach to data quality is the use of declarative statements to realize data quality primitives on top of any relational data source. A primary advantage of such an approach is the ease of use and integration with existing applications. Over the last few years several similarity predicates have been proposed for common quality primitives (approximate selections, joins, etc) and have been fully expressed using declarative SQL statements. In this paper we propose new similarity predicates along with their declarative realization, based on notions of probabilistic information retrieval. In particular we show how language models and hidden Markov models can be utilized as similarity predicates for data quality and present their full declarative instantiation. We also show how other scoring methods from information retrieval, can be utilized in a similar setting. We then present full declarative specifications of previously proposed similarity predicates in the literature, grouping them into classes according to their primary characteristics. Finally, we present a thorough performance and accuracy study comparing a large number of similarity predicates for data cleaning operations. We quantify both their runtime performance as well as their accuracy for several types of common quality problems encountered in operational databases.
机译:陈述数据质量一直是一个积极的研究主题。声明方法对数据质量的基本原则是使用声明性声明来实现任何关系数据源之上的数据质量原语。这种方法的主要优点是易用性和与现有应用程序集成。在过去的几年中,已经提出了几个相似性谓词用于共同的Quality原语(近似选择,加入等),并且已经使用声明性SQL语句完全表达。在本文中,我们基于概率信息检索的概念提出了新的相似性谓词及其陈述性实现。特别是,我们展示了语言模型和隐藏的马尔可夫模型如何用作数据质量的相似性谓词,并呈现完整的声明实例化。我们还展示了如何在类似的设置中使用信息检索的其他评分方法。然后,我们在文献中提出了先前提出的相似性谓词的完整声明性规范,根据其主要特征将其分组成类。最后,我们呈现了彻底的性能和准确性研究,比较了数据清洁操作的大量相似性谓词。我们量化其运行时性能以及它们在操作数据库中遇到的几种类型的常见质量问题的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号