...
首页> 外文期刊>Machine Learning >Learning to Match the Schemas of Data Sources: A Multistrategy Approach
【24h】

Learning to Match the Schemas of Data Sources: A Multistrategy Approach

机译:学习匹配数据源的架构:一种多策略方法

获取原文
获取原文并翻译 | 示例

摘要

The problem of integrating data from multiple data sources―either on the Internet or within enterprises―has received much attention in the database and AI communities. The focus has been on building data integration systems that provide a uniform query interface to the sources. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the query interface and the source schemas. Examples of mappings are "element location maps to address" and "price maps to listed-price". We propose a multistrategy learning approach to automatically find such mappings. The approach applies multiple learner modules, where each module exploits a different type of information either in the schemas of the sources or in their data, then combines the predictions of the modules using a meta-learner. Learner modules employ a variety of techniques, ranging from Naive Bayes and nearest-neighbor classification to entity recognition and information retrieval. We describe the LSD system, which employs this approach to find semantic mappings. To further improve matching accuracy, LSD exploits domain integrity constraints, user feedback, and nested structures in XML data. We test LSD experimentally on several real-world domains. The experiments validate the utility of multistrategy learning for data integration and show that LSD proposes semantic mappings with a high degree of accuracy.
机译:集成来自Internet或企业内部的多个数据源的数据的问题在数据库和AI社区中引起了很多关注。重点一直放在构建为数据源提供统一查询接口的数据集成系统上。构建此类系统的关键瓶颈在于查询界面与源模式之间的语义映射的手动构建。映射的示例是“元素位置映射到地址”和“价格映射到列出的价格”。我们提出了一种多策略学习方法来自动查找此类映射。该方法应用了多个学习器模块,其中每个模块在源的模式或其数据中利用不同类型的信息,然后使用元学习器组合模块的预测。学习者模块采用多种技术,从朴素贝叶斯和最近邻分类到实体识别和信息检索。我们描述了LSD系统,该系统采用这种方法来查找语义映射。为了进一步提高匹配准确性,LSD利用域完整性约束,用户反馈和XML数据中的嵌套结构。我们在几个实际域中对LSD进行了实验测试。实验验证了多策略学习在数据集成中的实用性,并表明LSD提出了高精度的语义映射。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号