Detecting Approximately Duplicate Records in Database

机译：在数据库中检测大约重复的记录

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The existing database system data quantity is huge, many of which are repeated data. Using the traditional approach for detecting approximately duplicate records to find similar duplicate records in the database will involve very large time complexity and space complexity, unable to obtain very good results. This chapter presents a method based on improved genetic neural network approach for detecting approximately duplicate records, using genetic algorithm to optimize the network's initial weights; and then using the BP algorithm to train the detection data to obtain network model. The experimental results show that this method can effectively solve the huge amount of approximately duplicate record data detection problem.

机译：现有的数据库系统数据量巨大，其中许多是重复数据。使用传统的方法来检测近似重复的记录以在数据库中找到相似的重复记录将涉及非常大的时间复杂度和空间复杂度，无法获得非常好的结果。本章提出了一种基于改进遗传神经网络的方法，用于检测近似重复的记录，并使用遗传算法来优化网络的初始权重。然后使用BP算法训练检测数据以获得网络模型。实验结果表明，该方法可以有效地解决大量重复记录数据的检测问题。

著录项

来源
《International conference on information engineering and applications》|2013年|325-332|共8页
会议地点
作者
Xingrui Liu; Lijun Xu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Approximately duplicate records; BP algorithm; Neural network; Initial weights;

机译：大约重复的记录; BP算法;神经网络;初始重量;

相似文献

外文文献
中文文献
专利

1. An n-gram-based approach for detecting approximately duplicate database records [J] . Zengping Tian, Hongjun Lu, Wenyun Ji, International journal on digital libraries . 2002,第4期

机译：基于n元语法的方法，用于检测近似重复的数据库记录
2. Detecting Duplicates and near Duplicates Records in Large Datasets [J] . Shailesh Singh, Syed Imtiyaz Hassan International Journal on Computer Science and Engineering . 2017,第5期

机译：在大型数据集中检测重复记录和近重复记录
3. THE DISTRIBUTION OF BIBLIOGRAPHIC RECORDS IN DATABASES USING DIFFERENT COUNTING METHODS FOR DUPLICATE RECORDS [J] . W.W.HOOD, CONCEPCION S.WILSON Scientometrics . 1999,第3期

机译：使用重复计数的不同计数方法的数据库中的双目记录分布
4. Detecting Approximately Duplicate Records in Database [C] . Xingrui Liu, Lijun Xu International conference on information engineering and applications . 2013

机译：在数据库中检测大约重复的记录
5. Electronic Documentation Support Tools and Text Duplication in the Electronic Medical Record. [D] . Wrenn, Jesse. 2010

机译：电子病历中的电子文档支持工具和文本复制。
6. Sarcopenia frailty and cachexia patients detected in a multisystem electronic health record database [O] . Ranjani N. Moorthi, Ziyue Liu, Sarah A. El-Azab, 2020

机译：在多系统电子健康记录数据库中检测到SARCOPENIAFRAIRTY和CACHEXIA患者
7. Detecting dispersed duplications in high-throughput sequencing data using a database-free approach [O] . Kroon M., Lameijer E-W., Lakenberg N., 2016

机译：使用无数据库方法检测高通量测序数据中的分散重复项
8. Global Ecosystems Database. Version 0.1 (Beta-test). EPA Global Climate Research Program. NOAA/NGDC Global Change Database Program. Prototype 1. Database Documentation. NGDC Key to Geophysical Records Documentation No. 25 [R] . Campbell, W. G., Kineman, J. J. 1991

机译：全球生态系统数据库。版本0.1（Beta测试）。 Epa全球气候研究计划。 NOaa / NGDC全球变化数据库计划。原型1.数据库文档。 NGDC地球物理记录关键文件第25号

Detecting Approximately Duplicate Records in Database

摘要

著录项

相似文献

相关主题

期刊订阅