首页> 外文期刊>International Journal on Computer Science and Engineering >Record Matching Over Query Results Using Fuzzy Ontological Document Clustering
【24h】

Record Matching Over Query Results Using Fuzzy Ontological Document Clustering

机译:使用模糊本体文档聚类对查询结果进行记录匹配

获取原文
获取外文期刊封面目录资料

摘要

Record matching is an essential step in duplicate detection as it identifies records representing same real-world entity. Supervised record matching methods require users to provide training data and therefore cannot be applied for web databases where query results are generated on-the-fly. To overcome the problem, a new record matching method named Unsupervised Duplicate Elimination (UDE) is proposed for identifying and eliminating duplicates among records in dynamic query results. The idea of this paper is to adjust the weights of record fields in calculating similarities among records. Two classifiers namely weight component similarity summing classifier, support vector machine classifier are iteratively employed with UDE where the first classifier utilizes the weights set to match records from different data sources. With the matched records as positive dataset and non duplicate records as negative set, the second classifier identifies new duplicates. Then, a new methodology to automatically interpret and cluster knowledge documents using an ontology schema is presented. Moreover, a fuzzy logic control approach is used to match suitable document cluster(s) for given patents based on their derived ontological semantic webs. Thus, this paper takes advantage of similarity among records from web databases and solves the online duplicate detection problem.
机译:记录匹配是重复检测中必不可少的步骤,因为它可以识别代表同一真实世界实体的记录。有监督的记录匹配方法要求用户提供培训数据,因此不能应用于即时生成查询结果的Web数据库。为了解决这个问题,提出了一种新的记录匹配方法,称为无监督重复消除(UDE),用于识别和消除动态查询结果中记录之间的重复。本文的想法是在计算记录之间的相似性时调整记录字段的权重。 UDE反复使用两个分类器,即权重分量相似度总和分类器,支持向量机分类器,其中第一个分类器利用权重集来匹配来自不同数据源的记录。将匹配的记录作为肯定数据集,将非重复记录作为否定集,第二个分类器将识别新的重复项。然后,提出了一种使用本体模式自动解释和聚类知识文档的新方法。此外,基于给定专利的派生本体语义网,使用模糊逻辑控制方法来匹配给定专利的合适文档簇。因此,本文利用Web数据库中记录之间的相似性,解决了在线重复检测问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号