首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Record Matching over Query Results from Multiple Web Databases
【24h】

Record Matching over Query Results from Multiple Web Databases

机译:记录来自多个Web数据库的查询结果的匹配

获取原文
获取原文并翻译 | 示例

摘要

Record matching, which identifies the records that represent the same real-world entity, is an important step for data integration. Most state-of-the-art record matching methods are supervised, which requires the user to provide training data. These methods are not applicable for the Web database scenario, where the records to match are query results dynamically generated on-the-fly. Such records are query-dependent and a prelearned method using training examples from previous query results may fail on the results of a new query. To address the problem of record matching in the Web database scenario, we present an unsupervised, online record matching method, UDD, which, for a given query, can effectively identify duplicates from the query result records of multiple Web databases. After removal of the same-source duplicates, the ȁC;presumedȁD; nonduplicate records from the same source can be used as training examples alleviating the burden of users having to manually label training examples. Starting from the nonduplicate set, we use two cooperating classifiers, a weighted component similarity summing classifier and an SVM classifier, to iteratively identify duplicates in the query results from multiple Web databases. Experimental results show that UDD works well for the Web database scenario where existing supervised methods do not apply.
机译:记录匹配(标识代表同一真实世界实体的记录)是数据集成的重要步骤。对大多数最新记录匹配方法进行监督,这要求用户提供训练数据。这些方法不适用于Web数据库方案,在该方案中,要匹配的记录是动态动态生成的查询结果。这样的记录是查询相关的,并且使用来自先前查询结果的训练示例的预学习方法可能对新查询的结果失败。为了解决Web数据库方案中的记录匹配问题,我们提出了一种无监督的在线记录匹配方法UDD,该方法对于给定的查询可以有效地从多个Web数据库的查询结果记录中识别重复项。除去相同来源的重复项后,ȁC;假定ȁD;来自相同来源的非重复记录可以用作培训示例,从而减轻了用户必须手动标记培训示例的负担。从非重复集开始,我们使用两个协作的分类器(加权分量相似性总和分类器和SVM分类器)来迭代地从多个Web数据库中识别查询结果中的重复项。实验结果表明,UDD适用于不适用现有监督方法的Web数据库方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号