Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach

机译：跨Web查询界面发现复杂匹配：一种关联挖掘方法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. While complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this paper takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this "deep Web," query interfaces generally form complex matchings between attribute groups (e.g., {author} corresponds to {first name, last name} in the Books domain). We observe that the cooccurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., {first name, last name}) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach. In particular, we develop the DCM framework, which consists of data preparation, dual mining of positive and negative correlations, and finally matching selection. Unlike previous correlation mining algorithms, which mainly focus on finding strong positive correlations, our algorithm cares both positive and negative correlations, especially the subtlety of negative correlations, due to its special importance in schema matching. This leads to the introduction of a new correlation measure, H-measure, distinct from those proposed in previous work. We evaluate our approach extensively and the results show good accuracy for discovering complex matchings.

机译：为了实现信息集成，模式匹配是发现跨异构源的属性的语义对应关系的关键步骤。尽管复杂的匹配很常见，但是由于它们的搜索空间更加复杂，所以大多数现有技术都集中在简单的1：1匹配上。为了解决这一挑战，本文采用了一种概念上新颖的方法，即将模式匹配视为关联挖掘，以完成我们匹配Web查询接口以集成Internet上众多数据库的任务。在这种“深层Web”上，查询界面通常会在属性组之间形成复杂的匹配关系（例如，{author}对应于Books域中的{first name，lastname}）。我们观察到，查询接口之间的共现模式经常揭示出这种复杂的语义关系：分组属性（例如{名，姓氏}）倾向于在查询接口中共存，并呈正相关。相反，同义词属性具有负相关性，因为它们很少同时出现。这种洞察力使我们能够通过相关挖掘方法发现复杂的匹配项。特别是，我们开发了DCM框架，该框架包括数据准备，正相关和负相关的双重挖掘以及最终的匹配选择。与以前的相关性挖掘算法（主要专注于发现强正相关性）不同，我们的算法同时关注正相关性和负相关性，尤其是负相关性的微妙之处，因为它在模式匹配中特别重要。这导致引入了一种新的相关度量H度量，与以前的工作中提出的相关。我们对方法进行了广泛评估，结果表明发现复杂匹配的准确性很高。

著录项

来源
《Association for Computing Machinery(ACM) Special Interest Group on Knowledge Discovery and Data Mining(SIGKDD) International Conference on Knowledge Discovery and Data Mining; 20040822-20040825; Seattle,WA; US》|2004年|P.148-157|共10页
会议地点 SeattleWAUS
作者
Bin He; Kevin Chen-Chuan Chang; Jiawei Han;
展开▼
作者单位

Computer Science Department University of Illinois at Urbana-Champaign;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化系统理论;
关键词
data integration; deep web; schema matching; correlation mining; correlation measure;

机译：数据集成;深层Web;模式匹配;相关挖掘;相关度量;

相似文献

外文文献
中文文献
专利

1. Automatic Complex Schema Matching Across Web Query Interfaces: A Correlation Mining Approach [J] . BIN HE, KEVIN CHEN-CHUAN CHANG ACM transactions on database systems . 2006,第1期

机译：跨Web查询接口的自动复杂模式匹配：一种关联挖掘方法
2. ETTA-IM: A deep web query interface matching approach based on evidence theory and task assignment [J] . Dong Yongquan, Li Qingzhong, Ding Yanhui, Expert systems with applications . 2011,第8期

机译：ETTA-IM：一种基于证据理论和任务分配的深度网络查询界面匹配方法
3. An evidential approach to query interface matching on the deep Web [J] . Jun Hong, Zhongtian He, David A. Bell Information Systems . 2010,第2期

机译：在深度Web上查询接口匹配的证据方法
4. Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach [C] . Bin He, Kevin Chen-Chuan Chang, Jiawei Han ACM SIGKDD international conference on knowledge discovery and data mining . 2004

机译：在Web查询接口中发现复杂匹配：相关挖掘方法
5. Discovering Deep-web Sources and Extracting Content using Automated Query Generation. [D] . Shrestha, Subodh. 2011

机译：使用自动查询生成发现深层网络源并提取内容。
6. Effective Filtering of Query Results on Updated User Behavioral Profiles in Web Mining [O] . S. Sadesh, R. C. Suganthe 2015

机译：在Web挖掘中对更新的用户行为配置文件上的查询结果进行有效过滤
7. Mining Complex Matchings across Web Query Interfaces [O] . 2008

机译：跨Web查询接口挖掘复杂匹配

Discovering Complex Matchings across Web Query Interfaces: A Correlation Mining Approach

摘要

著录项

相似文献

相关主题

期刊订阅