A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases

R. Parimala Devi; V. Thigarasu

首页> 外文期刊>Indian Journal of Science and Technology >A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases

【24h】

A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases

机译：来自多个Web数据库的时间动态记录的语义重复数据删除

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Objective: The main objective of this paper is to improve the true positive level of record deduplication using Ontology based MHMM-Fuzzy clustering approach. Methods/Statistical Analysis: Most of the record deduplication system in literature used genetic programming based record deduplication which combined different pieces of evidence extracted from the data content. However the accuracy of the system is low. To overcome this problem we propose a Multiple Hidden Markov Model (MHMM) which is used to increase the accuracy and also to identify joint duplicate records. In this model, if the database has multiple columns, it performs the deduplication for the all columns which will degrade the performance of the system. So to solve this problem, MHMM-Fuzzy Clustering based record deduplication is introduced. In this system Fuzzy clustering is performed through multiple observations from the Hidden Markov Model. Then duplicate data are grouped into one cluster according to their fuzzy logic and it can be eliminated easily. However the true positive level of the system is low. To improve the true positive level Fuzzy Ontology based semantic similarity is incorporated in MHMM-Fuzzy Clustering approach. This implies the improvement of the true positive level of the model. Thus it increases the efficiency of deduplication function that identifies the records of replica and duplications. Findings: Multiple Hidden Markov Model (MHMM) based record deduplication, MHMM-Fuzzy clustering based record deduplication and Ontology based MHMM-Fuzzy clustering approach are applied on Cora Bibliographic dataset and Restaurants dataset. The performance measures are evaluated in terms of precision, recall, f-measure, Execution time and accuracy results. Applications/Improvements: Thus the current research achieves improved result on record deduplication is better than previous works in terms of precision, recall, f-measure, Execution time and accuracy results.

机译：目的：本文的主要目的是使用基于本体的MHMM-Fuzzy聚类方法提高记录重复数据删除的真实水平。方法/统计分析：文献中的大多数记录重复数据删除系统都使用基于遗传编程的记录重复数据删除技术，该方法结合了从数据内容中提取的不同证据。但是，系统的精度较低。为克服此问题，我们提出了一种多重隐马尔可夫模型（MHMM），该模型用于提高准确性并标识联合重复记录。在此模型中，如果数据库具有多个列，则它将对所有列执行重复数据删除，这将降低系统性能。因此，为了解决这个问题，引入了基于MHMM-模糊聚类的记录重复数据删除技术。在这个系统中，模糊聚类是通过对隐马尔可夫模型的多次观察来进行的。然后根据重复数据的模糊逻辑将重复数据分组到一个群集中，可以轻松地将其消除。但是，系统的真正积极水平很低。为了提高真正的积极水平，在MHMM-模糊聚类方法中引入了基于模糊本体的语义相似性。这意味着模型真实正水平的提高。因此，它提高了重复数据删除功能的效率，该功能可识别副本和重复记录。研究结果：将基于多重隐马尔可夫模型（MHMM）的记录重复数据删除，基于MHMM-Fuzzy聚类的记录重复数据删除和基于本体的MHMM-Fuzzy聚类方法应用于Cora书目数据集和Restaurants数据集。根据精度，召回率，f度量，执行时间和准确性结果对性能度量进行评估。应用/改进：因此，当前的研究在记录重复数据删除方面取得了改进的结果，在准确性，查全率，f量度，执行时间和准确性结果方面比以前的工作要好。

著录项

来源
《Indian Journal of Science and Technology》 |2015年第34期|共页
作者
R. Parimala Devi; V. Thigarasu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类连续性出版物;
关键词

相似文献

外文文献
中文文献
专利

1. A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases [J] . R. Parimala Devi, V. Thigarasu Indian Journal of Science and Technology . 2015,第34期

机译：来自多个Web数据库的时间动态记录的语义重复数据删除
2. Semantic-JSON: a lightweight web service interface for Semantic Web contents integrating multiple life science databases [J] . Akihiro Matsushima, Manabu Ishii, Norio Kobayashi, Nucleic acids research . 2011,第suppla2期

机译：Semantic-JSON：轻量级Web服务接口，用于集成多个生命科学数据库的语义Web内容
3. Multiple Web Database Handle Using CTVS Method and Record Matching [J] . Harish Chaware, Prof. Nitin Chopade International Journal of Engineering Research and Applications . 2013,第3期

机译：使用CTVS方法和记录匹配的多个Web数据库句柄
4. Dart Database Grid: A Dynamic, Adaptive, RDF-Mediated, Transparent Approach to Database Integration for Semantic Web [C] . Zhaohui Wu, Huajun Chen, Yuxing Mao, Asia-Pacific Web Conference; 20050329-0401; Shanghai(CN) . 2005

机译：Dart数据库网格：一种用于语义Web的动态，自适应，RDF中介，透明的数据库集成方法
5. Spatio-temporal synchronization and semantic modeling for video and multimedia database systems [D] . Radev, Ivan Stefanov 1998

机译：视频和多媒体数据库系统的时空同步和语义建模
6. Semantic-JSON: a lightweight web service interface for Semantic Web contents integrating multiple life science databases [O] . Norio Kobayashi, Manabu Ishii, Satoshi Takahashi, 2011

机译：Semantic-JSON：轻型Web服务接口用于集成多个生命科学数据库的语义Web内容
7. Semantic-JSON: a lightweight web service interface for Semantic Web contents integrating multiple life science databases [O] . Kobayashi, Norio, Ishii, Manabu, Takahashi, Satoshi, 2011

机译：Semantic-JSON：轻型Web服务接口，用于集成多个生命科学数据库的语义Web内容

A Semantic Deduplication of Temporal Dynamic Records from Multiple Web Databases

摘要

著录项

相似文献

相关主题

期刊订阅