Schema matching prediction with applications to data source discovery and dynamic ensembling

Tomer Sagi; Avigdor Gal

首页> 外文期刊>The VLDB Journal >Schema matching prediction with applications to data source discovery and dynamic ensembling

【24h】

Schema matching prediction with applications to data source discovery and dynamic ensembling

机译：模式匹配预测及其在数据源发现和动态整合中的应用

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web-scale data integration involves fully automated efforts which lack knowledge of the exact match between data descriptions. In this paper, we introduce schema matching prediction, an assessment mechanism to support schema matchers in the absence of an exact match. Given attribute pair-wise similarity measures, a predictor predicts the success of a matcher in identifying correct correspondences. We present a comprehensive framework in which predictors can be defined, designed, and evaluated. We formally define schema matching evaluation and schema matching prediction using similarity spaces and discuss a set of four desirable properties of predictors, namely correlation, robustness, tunability, and generalization. We present a method for constructing predictors, supporting generalization, and introduce prediction models as means of tuning prediction toward various quality measures. We define the empirical properties of correlation and robustness and provide concrete measures for their evaluation. We illustrate the usefulness of schema matching prediction by presenting three use cases: We propose a method for ranking the relevance of deep Web sources with respect to given user needs. We show how predictors can assist in the design of schema matching systems. Finally, we show how prediction can support dynamic weight setting of matchers in an ensemble, thus improving upon current state-of-the-art weight setting methods. An extensive empirical evaluation shows the usefulness of predictors in these use cases and demonstrates the usefulness of prediction models in increasing the performance of schema matching.

机译：Web规模的数据集成涉及完全自动化的工作，这些工作缺乏对数据描述之间精确匹配的了解。在本文中，我们介绍了模式匹配预测，这是一种在没有精确匹配的情况下支持模式匹配器的评估机制。给定属性的成对相似度度量，预测器会预测匹配器在识别正确对应关系中的成功。我们提供了一个全面的框架，可以在其中定义，设计和评估预测变量。我们使用相似性空间正式定义模式匹配评估和模式匹配预测，并讨论一组四个期望的预测变量属性，即相关性，鲁棒性，可调性和泛化性。我们提出了一种构建预测变量，支持一般化的方法，并介绍了预测模型作为对各种质量度量进行预测调整的手段。我们定义了相关性和鲁棒性的经验属性，并提供了评估它们的具体措施。我们通过介绍三个用例来说明模式匹配预测的有用性：我们提出了一种方法，用于根据给定的用户需求对深层Web资源的相关性进行排名。我们展示了预测变量如何协助设计模式匹配系统。最后，我们展示了预测如何在整体中支持匹配器的动态权重设置，从而改进了当前最新的权重设置方法。广泛的经验评估显示了预测器在这些用例中的有用性，并证明了预测模型在提高模式匹配性能方面的有用性。

著录项

来源
《The VLDB Journal 》 |2013年第5期| 689-710| 共22页
作者
Tomer Sagi; Avigdor Gal;
展开▼
作者单位

Technion-Israel Institute of Technology">(1);

Technion-Israel Institute of Technology">(1);

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Data integration; Schema matching; Prediction;

机译：数据整合;模式匹配;预测;

相似文献

外文文献
中文文献
专利

1. Schema matching prediction with applications to data source discovery and dynamic ensembling [J] . Tomer Sagi, Avigdor Gal The VLDB journal . 2013 ,第5期

机译：模式匹配预测及其在数据源发现和动态整合中的应用
2. Applications of corpus-based semantic similarity and word segmentation to database schema matching [J] . Aminul Islam, Diana Inkpen, Iluju Kiringa The VLDB journal . 2008 ,第5期

机译：基于语料库的语义相似度和分词在数据库模式匹配中的应用
3. Using the Metadata Object Description Schema (MODS) for resource description: guidelines and applications [J] . Rebecca S. Guenther Library hi tech . 2004 ,第1期

机译：使用元数据对象描述架构（MODS）进行资源描述：准则和应用程序
4. Instance Discovery and Schema Matching with Applications to Biological Deep Web Data Integration [C] . Liu Tantan, Wang Fan, Agrawal Gagan 10th IEEE International Conference on BioInformatics and BioEngineering . 2010

机译：实例发现和与应用程序的模式匹配，以进行生物深层Web数据集成
5. Source discovery and schema mapping for data integration. [D] . Xu, Li. 2003

机译：用于数据集成的源发现和模式映射。
6. Multi-Source Ensemble Learning for the Remote Prediction of Parkinsons Disease in the Presence of Source-Wise Missing Data [O] . John Prince, Fernando Andreotti, Maarten De Vos -1

机译：存在源丢失数据的多源集合学习对帕金森氏病的远程预测
7. Applications of Corpus-based Semantic Similarity and Word Segmentation to Database Schema Matching [O] . 2014

机译：基于语料库的语义相似度和分词在数据库模式匹配中的应用

Schema matching prediction with applications to data source discovery and dynamic ensembling

摘要

著录项

相似文献

相关主题

期刊订阅