首页> 外文会议>SIGMOD/PODS 2007 >Benchmarking Declarative Approximate Selection Predicates
【24h】

Benchmarking Declarative Approximate Selection Predicates

机译:基准声明式近似选择谓词

获取原文

摘要

Declarative data quality has been an active research topic. The fundamental principle behind a declarative approach to data quality is the use of declarative statements to realize data quality primitives on top of any relational data source. A primary advantage of such an approach is the ease of use and integration with existing applications. Over the last few years several similarity predicates have been proposed for common quality primitives (approximate selections, joins, etc) and have been fully expressed using declarative SQL statements. In this paper we propose new similarity predicates along with their declarative realization, based on notions of probabilistic information retrieval. In particular we show how language models and hidden Markov models can be utilized as similarity predicates for data quality and present their full declarative instantiation. We also show how other scoring methods from information retrieval, can be utilized in a similar setting. We then present full declarative specifications of previously proposed similarity predicates in the literature, grouping them into classes according to their primary characteristics. Finally, we present a thorough performance and accuracy study comparing a large number of similarity predicates for data cleaning operations. We quantify both their runtime performance as well as their accuracy for several types of common quality problems encountered in operational databases.
机译:声明式数据质量一直是一个活跃的研究主题。声明性数据质量方法背后的基本原理是使用声明性语句在任何关系数据源之上实现数据质量原语。这种方法的主要优点是易于使用和与现有应用程序集成。在过去的几年中,已经为通用质量原语(近似选择,联接等)提出了几个相似性谓词,并已使用声明性SQL语句充分表达了这些相似性谓词。在本文中,我们基于概率信息检索的概念提出了新的相似谓词及其声明式实现。特别是,我们展示了如何将语言模型和隐马尔可夫模型用作数据质量的相似性谓词,并展示其完整的声明式实例化。我们还展示了如何在类似的环境中利用信息检索中的其他计分方法。然后,我们在文献中提供先前提出的相似谓词的完整说明性规范,并根据它们的主要特征将它们分组为类。最后,我们对数据清理操作的大量相似谓词进行了比较,对性能和准确性进行了全面的研究。对于运营数据库中遇到的几种常见质量问题,我们对它们的运行时性能以及准确性进行了量化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号