首页> 外文会议>ACM SIGMOD international conference on management of data >Optimizing Content Freshness of Relations Extracted From the Web Using Keyword Search
【24h】

Optimizing Content Freshness of Relations Extracted From the Web Using Keyword Search

机译:使用关键字搜索优化从Web提取的关系的内容新鲜度

获取原文

摘要

An increasing number of applications operate on data obtained from the Web. These applications typically maintain local copies of the web data to avoid network latency in data accesses. As the data on the Web evolves, it is critical that the local copy be kept up-to-date. Data freshness is one of the most important data quality issues, and has been extensively studied for various applications including web crawling. However, web crawling is focused on obtaining as many raw web pages as possible. Our applications, on the other hand, are interested in specific content from specific data sources. Knowing the content or the semantics of the data enables us to differentiate data items based on their importance and volatility, which are key factors that impact the design of the data synchronization strategy. In this work, we formulate the concept of content freshness, and present a novel approach that maintains content freshness with least amount of web communication. Specifically, we assume data is accessible through a general keyword search interface, and we form keyword queries based on their selectivity, as well their contribution to content freshness of the local copy. Experiments show the effectiveness of our approach compared with several naive methods for keeping data fresh.
机译:越来越多的应用程序对从网络获得的数据运行。这些应用程序通常维护Web数据的本地副本,以避免数据访问中的网络延迟。随着Web上的数据发展,当地副本保持最新状态至关重要。数据新鲜度是最重要的数据质量问题之一,并已广泛研究各种应用程序,包括网络爬网。但是,Web爬网集中在获得尽可能多的原始网页。另一方面,我们的应用程序对来自特定数据源的特定内容感兴趣。了解数据的内容或语义使我们能够根据其重要性和波动率来区分数据项,这是影响数据同步策略的设计的关键因素。在这项工作中,我们制定内容新鲜度的概念,并提出了一种以最少的Web通信维持内容新鲜度的新方法。具体而言,我们假设通过常规关键字搜索界面访问数据,我们根据其选择性形成关键字查询,以及它们对本地副本的内容新鲜度的贡献。实验表明了我们方法的有效性与几个天真的方法保持新鲜方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号