Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration

机译：在基于关键字搜索的数据集成中积极征求查询答案的反馈

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data experts. One promising approach is to avoid using a global schema, and instead to develop keyword search-based data integration - where the system lazily discovers associations enabling it to join together matches to keywords, and return ranked results. The user is expected to understand the data domain and provide feedback about answers' quality. The system generalizes such feedback to leam how to correctly integrate data. A major open challenge is that under this model, the user only sees and offers feedback on a few "top-k" results: this result set must be carefully selected to include answers of high relevance and answers that are highly informative when feedback is given on them. Existing systems merely focus on predicting relevance, by composing the scores of various schema and record matching algorithms. In this paper we show how to predict the uncertainty associated with a query result's score, as well as how informative feedback is on a given result. We build upon these foundations to develop an active learning approach to keyword search-based data integration, and we validate the effectiveness of our solution over real data from several very different domains.

机译：缩放数据集成的问题，使得新来源可以在发现时快速使用，仍然难以实现：集成数据的全局模式很难开发和扩展，并且模式和记录匹配技术受数据和数据的事实限制元数据通常是未指定的，并且必须由数据专家消斗。一个有希望的方法是避免使用全局架构，而是开发基于关键字的数据集成 - 系统潜在地绘制了启用它以将其加入到关键字的关联，并返回排名结果。预计用户将理解数据域并提供有关答案质量的反馈。系统将此类反馈概括为LeaM如何正确整合数据。一个主要的开放挑战是，在此模型下，用户只能看到并提供有关几个“Top-k”结果的反馈：必须仔细选择该结果集，以包括在给出反馈时具有高度信息的高相关性和答案的答案在他们。通过构成各种模式和记录匹配算法的分数，现有系统仅关注预测相关性。在本文中，我们展示了如何预测与查询结果分数相关的不确定性，以及如何在给定结果上的信息反馈。我们构建了这些基础，开发了一种积极的学习方法来关键字的基于搜索的数据集成，我们验证了来自几个非常不同的域的真实数据的解决方案。

著录项

来源
《International conference on very large data bases》|2013年||共12页
会议地点
作者
Zhepeng Yan; Nan Zheng; Zachary G. Ives; Partha Pratim Talukdar; Cong Yu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. Active learning in keyword search-based data integration [J] . Yan Zhepeng, Zheng Nan, Ives Zachary G., The VLDB journal . 2015,第5期

机译：基于关键字搜索的数据集成中的主动学习
2. Efficient processing of keyword queries over graph databases for finding effective answers [J] . Chang-Sup Park, Sungchae Lim Information Processing & Management . 2015,第1期

机译：通过图形数据库有效处理关键字查询，以找到有效答案
3. Answering Top-k Keyword Queries on Relational Databases [J] . Myint Myint Them, Mie Mie Su Thwin International journal of information retrieval research . 2012,第3期

机译：回答关系数据库上的前k个关键字查询
4. Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration [C] . Zhepeng Yan, Nan Zheng, Zachary G. Ives, International conference on very large data bases . 2013

机译：在基于关键字搜索的数据集成中主动征求反馈以获取查询答案
5. A Mediator-based Data Integration System for Query Answering using an Optimized Extended Inverse Rules Algorithm. [D] . Jayaraman, Gayathri. 2010

机译：基于介体的数据集成系统，用于使用优化的扩展逆规则算法进行查询应答。
6. Using ontology databases for scalable query answering inconsistency detection and data integration [O] . Paea LePendu, Dejing Dou -1

机译：使用本体数据库进行可扩展查询应答不一致检测和数据集成
7. Actively soliciting feedback for query answers in keyword search-based data integration [O] . Zhepeng Yan, Nan Zheng, Zachary G. Ives, 2015

机译：在基于关键字搜索的数据集成中积极征求对查询答案的反馈

Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅