Schema Clustering and Retrieval for Multi-domain Pay-As-You-Go Data Integration Systems

机译：模式聚类和检索多域付费和you-go数据集成系统

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A data integration system offers a single interface to multiple structured data sources. Many application contexts (e.g., searching structured data on the web) involve the integration of large numbers of structured data sources. At web scale, it is impractical to use manual or semi-automatic data integration methods, so a pay-as-you-go approach is more appropriate. A pay-as-you-go approach entails using a fully automatic approximate data integration technique to provide an initial data integration system (i.e., an initial mediated schema, and initial mappings from source schemas to the mediated schema), and then refining the system as it gets used. Previous research has investigated automatic approximate data integration techniques, but all existing techniques require the schemas being integrated to belong to the same conceptual domain. At web scale, it is impractical to classify schemas into domains manually or semi-automatically, which limits the applicability of these techniques, In this paper, we present an approach for clustering schemas into domains without any human intervention and based only on the names of attributes in the schemas. Our clustering approach deals with uncertainty in assigning schemas to domains using a probabilistic model. We also propose a query classifier that determines, for a given a keyword query, the most relevant domains to this query. We experimentally demonstrate the effectiveness of our schema clustering and query classification techniques.

机译：数据集成系统为多个结构化数据源提供单个接口。许多应用程序上下文（例如，在Web上搜索结构化数据）涉及大量结构化数据源的集成。在Web Scale下，使用手动或半自动数据集成方法是不切实际的，因此您的付费方法更合适。支付支付支付方法需要使用全自动近似数据集成技术来提供初始数据集成系统（即，初始中介模式以及从源模式到介导模式的初始映射），然后改装系统它被使用。以前的研究已经调查了自动近似数据集成技术，但所有现有技术都需要将模式集成到属于同一概念域。在Web Scale下，将模式对手动或半自动分类为域中是不切实际的，这限制了这些技术的适用性，在本文中，我们介绍了将模式的方法纳入域中而没有任何人为干预，并且仅基于名称模式中的属性。我们的聚类方法处理使用概率模型将模式分配给域的不确定性。我们还提出了一个查询分类器，用于给定关键字查询，对此查询的最相关的域来确定该查询。我们通过实验证明了我们的模式聚类和查询分类技术的有效性。

著录项

来源
《ACM SIGMOD international conference on management of data》|2010年||共12页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
data integration; clustering; classification;

机译：数据集成;聚类;分类;

相似文献

外文文献
中文文献
专利

1. XML-based integration data model and schema mapping in multidatabase systems [J] . Ruixuan Li, Zhengding Lu, Weijun Xiao, Systems Engineering and Electronics, Journal of . 2005,第2期

机译：多数据库系统中基于XML的集成数据模型和模式映射
2. XML-based integration data model and schema mapping in multidatabase systems [J] . Li Ruixuan, Lu Zhengding, Xiao Weijun, 系统工程与电子技术（英文版） . 2005,第002期

机译：基于XML的集成数据模型和MulidateAbase Systems中的架构
3. Clustering Schema Elements for Semantic Integration of Heterogeneous Data Sources [J] . Huimin Zhou, Sudha Ram Journal of database management . 2004,第4期

机译：异构数据源语义集成的聚类架构元素
4. Schema Clustering and Retrieval for Multi-domain Pay-As-You-Go Data Integration Systems [C] . Hatem A. Mahmoud, Ashraf Aboulnaga ACM SIGMOD international conference on management of data;SIGMOD 2010 . 2010

机译：多域随用随付数据集成系统的架构聚类和检索
5. Schema and data integration in heterogeneous multidatabase systems [D] . Albert, Joseph 1996

机译：异构多数据库系统中的架构和数据集成
6. Bridging the integration gap between imaging and information systems: a uniform data concept for content-based image retrieval in computer-aided diagnosis [O] . Petra Welter, Jörg Riesmeier, Benedikt Fischer, 2011

机译：弥合成像和信息系统之间的集成鸿沟：用于计算机辅助诊断中基于内容的图像检索的统一数据概念
7. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems [O] . Hatem A. Mahmoud, Ashraf Aboulnaga 2010

机译：多域即用即付数据集成系统的模式集群和检索

Schema Clustering and Retrieval for Multi-domain Pay-As-You-Go Data Integration Systems

摘要

著录项

相似文献

相关主题

期刊订阅