首页> 外文会议>International conference on very large data bases >Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration
【24h】

Actively Soliciting Feedback for Query Answers in Keyword Search-Based Data Integration

机译:在基于关键字搜索的数据集成中积极征求查询答案的反馈

获取原文

摘要

The problem of scaling up data integration, such that new sources can be quickly utilized as they are discovered, remains elusive: global schemas for integrated data are difficult to develop and expand, and schema and record matching techniques are limited by the fact that data and metadata are often under-specified and must be disambiguated by data experts. One promising approach is to avoid using a global schema, and instead to develop keyword search-based data integration - where the system lazily discovers associations enabling it to join together matches to keywords, and return ranked results. The user is expected to understand the data domain and provide feedback about answers' quality. The system generalizes such feedback to leam how to correctly integrate data. A major open challenge is that under this model, the user only sees and offers feedback on a few "top-k" results: this result set must be carefully selected to include answers of high relevance and answers that are highly informative when feedback is given on them. Existing systems merely focus on predicting relevance, by composing the scores of various schema and record matching algorithms. In this paper we show how to predict the uncertainty associated with a query result's score, as well as how informative feedback is on a given result. We build upon these foundations to develop an active learning approach to keyword search-based data integration, and we validate the effectiveness of our solution over real data from several very different domains.
机译:缩放数据集成的问题,使得新来源可以在发现时快速使用,仍然难以实现:集成数据的全局模式很难开发和扩展,并且模式和记录匹配技术受数据和数据的事实限制元数据通常是未指定的,并且必须由数据专家消斗。一个有希望的方法是避免使用全局架构,而是开发基于关键字的数据集成 - 系统潜在地绘制了启用它以将其加入到关键字的关联,并返回排名结果。预计用户将理解数据域并提供有关答案质量的反馈。系统将此类反馈概括为LeaM如何正确整合数据。一个主要的开放挑战是,在此模型下,用户只能看到并提供有关几个“Top-k”结果的反馈:必须仔细选择该结果集,以包括在给出反馈时具有高度信息的高相关性和答案的答案在他们。通过构成各种模式和记录匹配算法的分数,现有系统仅关注预测相关性。在本文中,我们展示了如何预测与查询结果分数相关的不确定性,以及如何在给定结果上的信息反馈。我们构建了这些基础,开发了一种积极的学习方法来关键字的基于搜索的数据集成,我们验证了来自几个非常不同的域的真实数据的解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号