首页> 外文会议>ACM SIGMOD international conference on management of data >Automatically Incorporating New Sources in Keyword Search-Based Data Integration
【24h】

Automatically Incorporating New Sources in Keyword Search-Based Data Integration

机译:在基于关键字的数据集成中自动结合新来源

获取原文

摘要

Scientific data offers some of the most interesting challenges in data integration today. Scientific fields evolve rapidly and accumulate masses of observational and experimental data that needs to be annotated, revised, interlinked, and made available to other scientists. From the perspective of the user, this can be a major headache as the data they seek may initially be spread across many databases in need of integration. Worse, even if users are given a solution that integrates the current state of the source databases, new data sources appear with new data items of interest to the user. Here we build upon recent ideas for creating integrated views over data sources using keyword search techniques, ranked answers, and user feedback [32] to investigate how to automatically discover when a new data source has content relevant to a user's view - in essence, performing automatic data integration for incoming data sets. The new architecture accommodates a variety of methods to discover related attributes, including label propagation algorithms from the machine learning community [2] and existing schema matchers [11]. The user may provide feedback on the suggested new results, helping the system repair any bad alignments or increase the cost of including a new source that is not useful. We evaluate our approach on actual bioinformatics schemas and data, using state-of-the-art schema matchers as components. We also discuss how our architecture can be adapted to more traditional settings with a mediated schema.
机译:科学数据在今天提供了一些最有趣的挑战。科学领域迅速发展,积累了需要注释,修改,交互的观测和实验数据的群众,并提供给其他科学家。从用户的角度来看,这可能是一个重大头痛,因为他们寻求的数据最初可以在需要集成的许多数据库中传播。更糟糕的是,即使用户被授予集成源数据库的当前状态的解决方案,也会显示新的数据源给用户的新数据项。在这里,我们最近建立了使用关键字搜索技术,排名答案和用户反馈[32]来创建通过数据源的集成视图[32]来调查如何在新数据源与用户视图相关的内容时自动发现 - 实质上,执行用于传入数据集的自动数据集成。新架构可容纳多种方法来发现相关属性,包括来自机器学习社区[2]和现有模式匹配器的标签传播算法[11]。用户可以提供有关建议的新结果的反馈,帮助系统修复任何不良对齐或增加包括无用的新源的成本。我们使用最先进的模式匹配器作为组件,评估我们对实际生物信息学模式和数据的方法。我们还讨论如何使用介导的架构进行更加传统的设置。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号