首页> 外文会议>ACM SIGMOD international conference on Management of data >An interactive clustering-based approach to integrating source query interfaces on the deep Web
【24h】

An interactive clustering-based approach to integrating source query interfaces on the deep Web

机译:基于交互式群集的方法,用于在深度Web上集成源查询接口

获取原文

摘要

An increasing number of data sources now become available on the Web, but often their contents are only accessible through query interfaces. For a domain of interest, there often exist many such sources with varied coverage or querying capabilities. As an important step to the integration of these sources, we consider the integration of their query interfaces. More specifically, we focus on the crucial step of the integration: accurately matching the interfaces. While the integration of query interfaces has received more attentions recently, current approaches are not sufficiently general: (a) they all model interfaces with flat schemas; (b) most of them only consider 1:1 mappings of fields over the interfaces; (c) they all perform the integration in a blackbox-like fashion and the whole process has to be restarted from scratch if anything goes wrong; and (d) they often require laborious parameter tuning. In this paper, we propose an interactive, clustering-based approach to matching query interfaces. The hierarchical nature of interfaces is captured with ordered trees. Varied types of complex mappings of fields are examined and several approaches are proposed to effectively identify these mappings. We put the human integrator back in the loop and propose several novel approaches to the interactive learning of parameters and the resolution of uncertain mappings. Extensive experiments are conducted and results show that our approach is highly effective.
机译:现在,越来越多的数据源可以在Web上使用,但通常只能通过查询界面访问其内容。对于感兴趣的领域,通常会存在许多具有不同覆盖范围或查询功能的此类资源。作为整合这些资源的重要一步,我们考虑整合其查询接口。更具体地说,我们专注于集成的关键步骤:准确匹配接口。虽然查询接口的集成最近受到了越来越多的关注,但是当前的方法还不够通用:(a)它们都使用平面模式对接口进行建模; (b)他们大多数只考虑接口上字段的1:1映射; (c)他们都以类似黑盒的方式进行集成,如果出现任何问题,必须从头重新启动整个过程; (d)他们经常需要费力的参数调整。在本文中,我们提出了一种基于交互的,基于聚类的方法来匹配查询接口。接口的分层性质是用有序树捕获的。检查了复杂的字段映射的各种类型,并提出了几种方法来有效地识别这些映射。我们将人类积分器放回了循环中,并提出了一些新颖的方法来交互式学习参数和解决不确定映射的问题。进行了广泛的实验,结果表明我们的方法非常有效。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号