【24h】

Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach

机译:跨Web查询界面发现复杂匹配:一种关联挖掘方法

获取原文
获取原文并翻译 | 示例

摘要

To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. While complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this paper takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this "deep Web," query interfaces generally form complex matchings between attribute groups (e.g., {author} corresponds to {first name, last name} in the Books domain). We observe that the cooccurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., {first name, last name}) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach. In particular, we develop the DCM framework, which consists of data preparation, dual mining of positive and negative correlations, and finally matching selection. Unlike previous correlation mining algorithms, which mainly focus on finding strong positive correlations, our algorithm cares both positive and negative correlations, especially the subtlety of negative correlations, due to its special importance in schema matching. This leads to the introduction of a new correlation measure, H-measure, distinct from those proposed in previous work. We evaluate our approach extensively and the results show good accuracy for discovering complex matchings.
机译:为了实现信息集成,模式匹配是发现跨异构源的属性的语义对应关系的关键步骤。尽管复杂的匹配很常见,但是由于它们的搜索空间更加复杂,所以大多数现有技术都集中在简单的1:1匹配上。为了解决这一挑战,本文采用了一种概念上新颖的方法,即将模式匹配视为关联挖掘,以完成我们匹配Web查询接口以集成Internet上众多数据库的任务。在这种“深层Web”上,查询界面通常会在属性组之间形成复杂的匹配关系(例如,{author}对应于Books域中的{first name,lastname})。我们观察到,查询接口之间的共现模式经常揭示出这种复杂的语义关系:分组属性(例如{名,姓氏})倾向于在查询接口中共存,并呈正相关。相反,同义词属性具有负相关性,因为它们很少同时出现。这种洞察力使我们能够通过相关挖掘方法发现复杂的匹配项。特别是,我们开发了DCM框架,该框架包括数据准备,正相关和负相关的双重挖掘以及最终的匹配选择。与以前的相关性挖掘算法(主要专注于发现强正相关性)不同,我们的算法同时关注正相关性和负相关性,尤其是负相关性的微妙之处,因为它在模式匹配中特别重要。这导致引入了一种新的相关度量H度量,与以前的工作中提出的相关。我们对方法进行了广泛评估,结果表明发现复杂匹配的准确性很高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号