IMPROVE THE QUALITY OF STATISTICAL METHOD OF OBTAINING REPRESENTATIVE DATA SCHEME FOR DE-DUPLICATION USING FUZZY CLUSTERING AND GENETIC ALGORITHM

RAVIKANTH.M; DR.D.VASUMATHI

首页> 外文期刊>Journal of Theoretical and Applied Information Technology >IMPROVE THE QUALITY OF STATISTICAL METHOD OF OBTAINING REPRESENTATIVE DATA SCHEME FOR DE-DUPLICATION USING FUZZY CLUSTERING AND GENETIC ALGORITHM

【24h】

IMPROVE THE QUALITY OF STATISTICAL METHOD OF OBTAINING REPRESENTATIVE DATA SCHEME FOR DE-DUPLICATION USING FUZZY CLUSTERING AND GENETIC ALGORITHM

机译：利用模糊聚类和遗传算法提高去重复性代表数据方案统计方法的质量

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Record De-duplication is the important task under merging different database records. We can provide tuning results to the users after implementation of de-duplication operation. Existing approaches are failing under tuning of web databases and removal of duplicate records. All existing approaches are not providing efficient and effective results [1] [2] [3] [4]. In this paper we are designing one new prototype discussion related to effective and enhanced de-duplication. Prototype design starts with fuzzy clustering and genetic algorithm. Its can control more number of duplicate records compare to other approaches. Its saves more storage and time compare to other approaches [12] [13]. In distributed databases the complexity of finding similarity factor is very high. The existing techniques are not accurate to minimize the duplication in the same data base. In the present work a new technique is proposed to improve the accuracy level [24]. In the proposed work a multi-level technical process implemented like tuning. The tuning technique finds all types of duplicated documents in the database. Here all duplicate files are searched with all attributes in sequential order in tree fashion. The results are further improved and reached to an optimized and acceptable range with new data duplication detection method with Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). It further removes unwanted residual files from the database. Bases on the view of previous ranking system problems a new manifold ranking is proposed in the current research work. In the proposed system the ranking is evaluated with new multimodality manifold ranking with sink points.

机译：记录重复数据删除是合并不同数据库记录下的重要任务。实施重复数据删除操作后，我们可以向用户提供调整结果。现有方法在调整Web数据库和删除重复记录方面失败了。所有现有方法都无法提供有效的结果[1] [2] [3] [4]。在本文中，我们正在设计一个与有效和增强的重复数据删除有关的新原型讨论。原型设计从模糊聚类和遗传算法开始。与其他方法相比，它可以控制更多数量的重复记录。与其他方法相比，它节省了更多的存储空间和时间[12] [13]。在分布式数据库中，查找相似因子的复杂度很高。现有技术无法准确地最小化同一数据库中的重复项。在目前的工作中，提出了一种新技术来提高准确性水平[24]。在拟议的工作中，实施了诸如调优之类的多级技术流程。调整技术可在数据库中查找所有类型的重复文档。在这里，所有重复文件都以树形式按顺序搜索所有属性。通过采用遗传算法（GA）和粒子群优化（PSO）的新数据重复检测方法，结果得到了进一步改善，并达到了一个最佳的可接受范围。它进一步从数据库中删除不需要的残留文件。基于先前的排名系统问题的观点，在当前的研究工作中提出了新的流形排名。在提出的系统中，使用带有汇点的新多模态流形等级对等级进行评估。

著录项

来源
《Journal of Theoretical and Applied Information Technology》 |2017年第8期|共页
作者
RAVIKANTH.M; DR.D.VASUMATHI;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Hybrid fuzzy clustering methods based on improved self-adaptive cellular genetic algorithm and optimal-selection-based fuzzy c-means [J] . Jie Lilin, Liu Weidong, Sun Zheng, Neurocomputing . 2017,第AUGa2期

机译：基于改进的自适应细胞遗传算法和基于最优选择的模糊c均值的混合模糊聚类方法
2. Web Usage Data Clustering Using Improved Genetic Fuzzy C-Means Algorithm [J] . Karunesh Gupta, Manish Shrivastava International Journal of Advanced Computer Research . 2013,第12期

机译：改进的遗传模糊C-均值算法的Web使用数据聚类
3. Web Usage Data Clustering Using Improved Genetic Fuzzy C-Means Algorithm [J] . Karunesh Gupta, Manish Shrivastava International Journal of Advanced Computer Research . 2012,第4期

机译：改进的遗传模糊C-均值算法的Web使用数据聚类
4. An improved density-based cluster analysis method combining genetic algorithm and data sampling for large-scale datasets [C] . Ye Zonglin, Cao Hui, Wang Miaomiao, Chinese Control Conference . 2013

机译：结合遗传算法和数据采样的大规模数据集改进的基于密度的聚类分析方法
5. Feature extraction with genetic algorithms for fuzzy clustering. [D] . Velthuizen, Robert Paul. 1996

机译：利用遗传算法对模糊聚类进行特征提取。
6. Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm Minimum Spanning Tree and Hierarchical Clustering in an Applied Study [O] . Saeedeh Pourahmad, Atefeh Basirat, Amir Rahimi, 2020

机译：初始簇质心的确定是否提高了K-Means聚类算法的性能？应用研究中遗传算法最小生成树和分层聚类的三种混合方法的比较
7. Data Cube Clustering with Improved DBSCAN based on Fuzzy Logic and Genetic Algorithm [O] . Mina Hosseini Rad, Majid Abdolrazzagh-Nezhad 2020

机译：基于模糊逻辑和遗传算法的改进DBSCAN的数据多维数据集聚类

IMPROVE THE QUALITY OF STATISTICAL METHOD OF OBTAINING REPRESENTATIVE DATA SCHEME FOR DE-DUPLICATION USING FUZZY CLUSTERING AND GENETIC ALGORITHM

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅