首页> 外文会议>2nd international conference on web search and data mining 2009 >The Web Changes Everything: Understanding the Dynamics of Web Content
【24h】

The Web Changes Everything: Understanding the Dynamics of Web Content

机译:Web改变了一切:了解Web内容的动态

获取原文
获取原文并翻译 | 示例

摘要

The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different user visitation patterns. Although change over long intervals has been explored on random (and potentially unvisited) samples of Web pages, little is known about the nature of finer grained changes to pages that are actively consumed by users, such as those in our sample. We describe algorithms, analyses, and models for characterizing changes in Web content, focusing on both time (by using hourly and sub-hourly crawls) and structure (by looking at page-, DOM-, and term-level changes). Change rates are higher in our behavior-based sample than found in previous work on randomly sampled pages, with a large portion of pages changing more than hourly. Detailed content and structure analyses identify stable and dynamic content within each page. The understanding of Web change we develop in this paper has implications for tools designed to help people interact with dynamic Web content, such as search engines, advertising, and Web browsers.
机译:网络是一个动态的,不断变化的信息集合。本文通过分析55,000个网页的爬网来探索Web内容的变化,这些网页被选择代表不同的用户访问模式。尽管已经对随机(且可能未访问)的Web页面样本进行了长时间间隔的更改,但是对于用户正在积极使用的页面(例如本样本中的页面)进行细粒度更改的性质,人们所知甚少。我们描述了用于表征Web内容变化的算法,分析和模型,重点关注时间(通过使用每小时和每小时的抓取)和结构(通过查看页面,DOM和术语级别的变化)。在基于行为的样本中,更改率比以前在随机采样的页面上发现的更改要高,其中很大一部分页面每小时更改一次。详细的内容和结构分析可确定每个页面中的稳定和动态内容。我们在本文中开发的对Web更改的理解对于旨在帮助人们与动态Web内容进行交互的工具(例如搜索引擎,广告和Web浏览器)具有意义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号