In this paper, we developed a robust data cleaning technique, called PC-Filter+ (PC stands for partition comparison) based on its predecessor, for effective and efficient duplicate record detection in large databases. PC-Filter+ provides more flexible algorithmic options for constructing the Partition Comparison Graph (PCG). In addition, PC-Filter+ is able to deal with duplicate detection under different memory constraints.
展开▼