首页> 外国专利> Detecting duplicate records in database

Detecting duplicate records in database

机译:检测数据库中的重复记录

摘要

The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.
机译:本发明涉及数据库中重复元组的检测。先前对域的重复元组的独立于域的检测依赖于多属性元组之间的标准相似度函数(例如,编辑距离,余弦度量)。但是,如果将这些现有技术方法用于识别特定于域的缩写和约定,则会导致大量误报。根据本发明,基于对来自数据仓库中的多维表的记录的解释来实现重复检测的过程,这些记录与通过雪花模式中的键-外键关系指定的层次结构相关联。本发明利用从表层次结构中可获得的额外知识来开发高质量,可扩展的重复检测过程。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号