...
首页> 外文期刊>The VLDB Journal >Schema matching prediction with applications to data source discovery and dynamic ensembling
【24h】

Schema matching prediction with applications to data source discovery and dynamic ensembling

机译:模式匹配预测及其在数据源发现和动态整合中的应用

获取原文
获取原文并翻译 | 示例

摘要

Web-scale data integration involves fully automated efforts which lack knowledge of the exact match between data descriptions. In this paper, we introduce schema matching prediction, an assessment mechanism to support schema matchers in the absence of an exact match. Given attribute pair-wise similarity measures, a predictor predicts the success of a matcher in identifying correct correspondences. We present a comprehensive framework in which predictors can be defined, designed, and evaluated. We formally define schema matching evaluation and schema matching prediction using similarity spaces and discuss a set of four desirable properties of predictors, namely correlation, robustness, tunability, and generalization. We present a method for constructing predictors, supporting generalization, and introduce prediction models as means of tuning prediction toward various quality measures. We define the empirical properties of correlation and robustness and provide concrete measures for their evaluation. We illustrate the usefulness of schema matching prediction by presenting three use cases: We propose a method for ranking the relevance of deep Web sources with respect to given user needs. We show how predictors can assist in the design of schema matching systems. Finally, we show how prediction can support dynamic weight setting of matchers in an ensemble, thus improving upon current state-of-the-art weight setting methods. An extensive empirical evaluation shows the usefulness of predictors in these use cases and demonstrates the usefulness of prediction models in increasing the performance of schema matching.
机译:Web规模的数据集成涉及完全自动化的工作,这些工作缺乏对数据描述之间精确匹配的了解。在本文中,我们介绍了模式匹配预测,这是一种在没有精确匹配的情况下支持模式匹配器的评估机制。给定属性的成对相似度度量,预测器会预测匹配器在识别正确对应关系中的成功。我们提供了一个全面的框架,可以在其中定义,设计和评估预测变量。我们使用相似性空间正式定义模式匹配评估和模式匹配预测,并讨论一组四个期望的预测变量属性,即相关性,鲁棒性,可调性和泛化性。我们提出了一种构建预测变量,支持一般化的方法,并介绍了预测模型作为对各种质量度量进行预测调整的手段。我们定义了相关性和鲁棒性的经验属性,并提供了评估它们的具体措施。我们通过介绍三个用例来说明模式匹配预测的有用性:我们提出了一种方法,用于根据给定的用户需求对深层Web资源的相关性进行排名。我们展示了预测变量如何协助设计模式匹配系统。最后,我们展示了预测如何在整体中支持匹配器的动态权重设置,从而改进了当前最新的权重设置方法。广泛的经验评估显示了预测器在这些用例中的有用性,并证明了预测模型在提高模式匹配性能方面的有用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号