首页> 外文期刊>Journal of Theoretical and Applied Information Technology >IMPROVE THE QUALITY OF STATISTICAL METHOD OF OBTAINING REPRESENTATIVE DATA SCHEME FOR DE-DUPLICATION USING FUZZY CLUSTERING AND GENETIC ALGORITHM
【24h】

IMPROVE THE QUALITY OF STATISTICAL METHOD OF OBTAINING REPRESENTATIVE DATA SCHEME FOR DE-DUPLICATION USING FUZZY CLUSTERING AND GENETIC ALGORITHM

机译:利用模糊聚类和遗传算法提高去重复性代表数据方案统计方法的质量

获取原文
获取外文期刊封面目录资料

摘要

Record De-duplication is the important task under merging different database records. We can provide tuning results to the users after implementation of de-duplication operation. Existing approaches are failing under tuning of web databases and removal of duplicate records. All existing approaches are not providing efficient and effective results [1] [2] [3] [4]. In this paper we are designing one new prototype discussion related to effective and enhanced de-duplication. Prototype design starts with fuzzy clustering and genetic algorithm. Its can control more number of duplicate records compare to other approaches. Its saves more storage and time compare to other approaches [12] [13]. In distributed databases the complexity of finding similarity factor is very high. The existing techniques are not accurate to minimize the duplication in the same data base. In the present work a new technique is proposed to improve the accuracy level [24]. In the proposed work a multi-level technical process implemented like tuning. The tuning technique finds all types of duplicated documents in the database. Here all duplicate files are searched with all attributes in sequential order in tree fashion. The results are further improved and reached to an optimized and acceptable range with new data duplication detection method with Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). It further removes unwanted residual files from the database. Bases on the view of previous ranking system problems a new manifold ranking is proposed in the current research work. In the proposed system the ranking is evaluated with new multimodality manifold ranking with sink points.
机译:记录重复数据删除是合并不同数据库记录下的重要任务。实施重复数据删除操作后,我们可以向用户提供调整结果。现有方法在调整Web数据库和删除重复记录方面失败了。所有现有方法都无法提供有效的结果[1] [2] [3] [4]。在本文中,我们正在设计一个与有效和增强的重复数据删除有关的新原型讨论。原型设计从模糊聚类和遗传算法开始。与其他方法相比,它可以控制更多数量的重复记录。与其他方法相比,它节省了更多的存储空间和时间[12] [13]。在分布式数据库中,查找相似因子的复杂度很高。现有技术无法准确地最小化同一数据库中的重复项。在目前的工作中,提出了一种新技术来提高准确性水平[24]。在拟议的工作中,实施了诸如调优之类的多级技术流程。调整技术可在数据库中查找所有类型的重复文档。在这里,所有重复文件都以树形式按顺序搜索所有属性。通过采用遗传算法(GA)和粒子群优化(PSO)的新数据重复检测方法,结果得到了进一步改善,并达到了一个最佳的可接受范围。它进一步从数据库中删除不需要的残留文件。基于先前的排名系统问题的观点,在当前的研究工作中提出了新的流形排名。在提出的系统中,使用带有汇点的新多模态流形等级对等级进行评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号