Probabilistic Iterative Duplicate Detection

机译：概率迭代重复检测

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of identifying approximately duplicate records between databases is known, among others, as duplicate detection or record linkage. To this end, typically either rules or a weighted aggregation of distances between the individual attributes of potential duplicates is used. However, choosing the appropriate rules, distance functions, weights, and thresholds requires deep understanding of the application domain or a good representative training set for supervised learning approaches. In this paper we present an unsupervised, domain independent approach that starts with a broad alignment of potential duplicates, and analyses the distribution of observed distances among potential duplicates and among non-duplicates to iteratively refine the initial alignment. Evaluations show that this approach supersedes other unsupervised approaches and reaches almost the same accuracy as even fully supervised, domain dependent approaches.

机译：识别数据库之间的近似重复记录的问题尤其被称为重复检测或记录链接。为此，通常使用规则或潜在重复项的各个属性之间的距离的加权聚合。但是，选择适当的规则，距离函数，权重和阈值需要对应用程序领域有深入的了解，或者需要对有监督的学习方法有良好的代表性培训。在本文中，我们提出了一种无监督的，独立于域的方法，该方法从潜在重复项的广泛比对入手，并分析潜在重复项与非重复项之间的观察距离分布，以迭代地优化初始对齐方式。评估表明，该方法取代了其他非监督方法，并且达到了与完全监督，依赖域的方法几乎相同的准确性。

著录项

来源
《OTM(On the Move) Confederated International Conference: CoopIS(Cooperative Information Systems), DOA(Distributed Objects and Applications), and ODBASE(Ontologies, DataBases and Applications of SEmantics) 2005 pt.2; 20051031-1104; Agia Napa(CY)》|2005年|P.1225-1242|共18页
会议地点 Agia Napa(CY)
作者
Patrick Lehti; Peter Fankhauser;
展开▼
作者单位

Fraunhofer IPSI, Dolivostr. 15, Darmstadt, Germany;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
入库时间 2022-08-26 14:12:10

相似文献

外文文献
中文文献
专利

1. DB 2 : a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads [J] . G?khan Yava?, Mehmet Koyutürk, Meetha P Gould, BMC Genomics . 2014,第1期

机译：DB 2：使用配对读取的准确检测串联重复断点的概率方法
2. Scalable Iterative Graph Duplicate Detection [J] . Herschel Melanie, Naumann Felix, Szott Sascha, Knowledge and Data Engineering, IEEE Transactions on . 2012,第11期

机译：可扩展的迭代图重复检测
3. Iterative joint integrated probabilistic data association filter for multiple-detection multiple-target tracking [J] . Xie Yifan, Huang Yuan, Song Taek Lyul Digital Signal Processing . 2018,第期

机译：用于多次检测多目标跟踪的迭代联合集成概率数据关联滤波器
4. Probabilistic Iterative Duplicate Detection [C] . Patrick Lehti, Peter Fankhauser On the Move Federated Conferences;International conference on cooperative information systems;International conference on distributed objects and applications . 2005

机译：概率迭代重复检测
5. Novel Class Detection and Cross-Lingual Duplicate Detection Over Online Data Stream [D] . Mustafa, Ahmad Mohammad. 2018

机译：在线数据流上的新型类检测和跨语言重复检测
6. DB2: a probabilistic approach for accurate detection of tandem duplication breakpoints using paired-end reads [O] . Gökhan Yavaş, Mehmet Koyutürk, Meetha P Gould, 2014

机译：DB2：一种概率方法可使用配对末端读取来准确检测串联复制断点
7. Scalable Iterative Graph Duplicate Detection, in [O] . Melanie Herschel, Felix Naumann, Sascha Szott, 2014

机译：可伸缩的迭代图重复检测

Probabilistic Iterative Duplicate Detection

摘要

著录项

相似文献

相关主题

期刊订阅