首页> 外文会议>International Conference on Machine Vision >DUPLICATE RECORD DETECTION FOR DATABASE CLEANSING
【24h】

DUPLICATE RECORD DETECTION FOR DATABASE CLEANSING

机译:数据库清洁的重复记录检测

获取原文

摘要

Many organizations collect large amounts of data to support their business and decision making processes. The data collected from various sources may have data quality problems in it. These kinds of issues become prominent when various databases are integrated. The integrated databases inherit the data quality problems that were present in the source database. The data in the integrated systems need to be cleaned for proper decision making. Cleansing of data is one of the most crucial steps. In this research, focus is on one of the major issue of data cleansing i.e. "duplicate record detection" which arises when the data is collected from various sources. As a result of this research study, comparison among standard duplicate elimination algorithm (SDE), sorted neighborhood algorithm (SNA), duplicate elimination sorted neighborhood algorithm (DE-SNA), and adaptive duplicate detection algorithm (ADD) is provided. A prototype is also developed which shows that adaptive duplicate detection algorithm is the optimal solution for the problem of duplicate record detection. For approximate matching of data records, string matching algorithms (recursive algorithm with word base and recursive algorithm with character base) have been implemented and it is concluded that the results are much better with recursive algorithm with word base.
机译:许多组织收集大量数据以支持其业务和决策过程。从各种来源收集的数据可能具有它的数据质量问题。当各种数据库集成时,这些问题变得突出。集成数据库继承了源数据库中存在的数据质量问题。需要清洁集成系统中的数据以进行正确的决策。清洁数据是最关键的步骤之一。在这项研究中,焦点是数据清理的主要问题之一,即“重复记录检测”,其在各种来源收集数据时出现。由于该研究的研究,提供了标准复制消除算法(SDE),分类邻域算法(SNA),重复消除分类邻域算法(DE-SNA)和自适应重复检测算法(ADD)之间的比较。还开发了一种原型,其示出了自适应重复检测算法是重复记录检测问题的最佳解决方案。对于数据记录的近似匹配,已经实现了字符串匹配算法(具有字符基本基础的递归算法和具有字符基座的递归算法),并且得出结论,使用Word Base的递归算法更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号