Automatically Incorporating New Sources in Keyword Search-Based Data Integration

机译：在基于关键字的数据集成中自动结合新来源

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Scientific data offers some of the most interesting challenges in data integration today. Scientific fields evolve rapidly and accumulate masses of observational and experimental data that needs to be annotated, revised, interlinked, and made available to other scientists. From the perspective of the user, this can be a major headache as the data they seek may initially be spread across many databases in need of integration. Worse, even if users are given a solution that integrates the current state of the source databases, new data sources appear with new data items of interest to the user. Here we build upon recent ideas for creating integrated views over data sources using keyword search techniques, ranked answers, and user feedback [32] to investigate how to automatically discover when a new data source has content relevant to a user's view - in essence, performing automatic data integration for incoming data sets. The new architecture accommodates a variety of methods to discover related attributes, including label propagation algorithms from the machine learning community [2] and existing schema matchers [11]. The user may provide feedback on the suggested new results, helping the system repair any bad alignments or increase the cost of including a new source that is not useful. We evaluate our approach on actual bioinformatics schemas and data, using state-of-the-art schema matchers as components. We also discuss how our architecture can be adapted to more traditional settings with a mediated schema.

机译：科学数据在今天提供了一些最有趣的挑战。科学领域迅速发展，积累了需要注释，修改，交互的观测和实验数据的群众，并提供给其他科学家。从用户的角度来看，这可能是一个重大头痛，因为他们寻求的数据最初可以在需要集成的许多数据库中传播。更糟糕的是，即使用户被授予集成源数据库的当前状态的解决方案，也会显示新的数据源给用户的新数据项。在这里，我们最近建立了使用关键字搜索技术，排名答案和用户反馈[32]来创建通过数据源的集成视图[32]来调查如何在新数据源与用户视图相关的内容时自动发现 - 实质上，执行用于传入数据集的自动数据集成。新架构可容纳多种方法来发现相关属性，包括来自机器学习社区[2]和现有模式匹配器的标签传播算法[11]。用户可以提供有关建议的新结果的反馈，帮助系统修复任何不良对齐或增加包括无用的新源的成本。我们使用最先进的模式匹配器作为组件，评估我们对实际生物信息学模式和数据的方法。我们还讨论如何使用介导的架构进行更加传统的设置。

著录项

来源
《ACM SIGMOD international conference on management of data》|2010年||共12页
会议地点
作者

展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词
machine learning; user feedback; schema matching; schema align-ment; keyword search; data integration;

机译：机器学习;用户反馈;架构匹配;架构align-ment;关键词搜索;数据集成;

相似文献

外文文献
中文文献
专利

1. Active learning in keyword search-based data integration [J] . Yan Zhepeng, Zheng Nan, Ives Zachary G., The VLDB journal . 2015,第5期

机译：基于关键字搜索的数据集成中的主动学习
2. NoSQL data model for semi-automatic integration of ethnomedicinal plant data from multiple sources. [J] . Ningthoujam S. S., Choudhury M. D., Potsangbam K. S., Phytochemical Analysis . 2014,第6期

机译：NoSQL数据模型可用于半自动集成来自多个来源的民族植物植物数据。
3. Top-K data source selection for keyword queries over multiple XML data sources [J] . Khanh Nguyen, Jinli Cao Journal of Information Science . 2012,第2期

机译：通过多个XML数据源进行关键字查询的Top-K数据源选择
4. Automatically Incorporating New Sources in Keyword Search-Based Data Integration [C] . Partha Pratim Talukdar, Zachary G. Ives, Fernando Pereira ACM SIGMOD international conference on management of data;SIGMOD 2010 . 2010

机译：在基于关键字搜索的数据集成中自动整合新来源
5. A flexible automatically adaptive surface nuclear magnetic resonance modelling and inversion framework incorporating complex data and static dephasing dynamics. [D] . Irons, Trevor P. 2013

机译：灵活的自适应表面核磁共振建模和反演框架，结合了复杂的数据和静态移相动力学。
6. PANDORA: keyword-based analysis of protein sets by integration of annotation sources [O] . Noam Kaplan, Avishay Vaaknin, Michal Linial 2003

机译：PANDORA：通过注释源的集成基于关键词的蛋白质集分析
7. Automatically Incorporating New Sources in Keyword Search-Based Data Integration [O] . Partha Pratim Talukdar, Zachary G. Ives, Fernando Pereira 2011

机译：在基于关键字搜索的数据集成中自动合并新源
8. Web-Scale Search-Based Data Extraction and Integration [R] . Chang, K. C., Shuck, T., Kabra, G. 2011

机译：基于Web规模搜索的数据提取与集成

Automatically Incorporating New Sources in Keyword Search-Based Data Integration

摘要

著录项

相似文献

相关主题

期刊订阅