【24h】

AJAX Crawl: Making AJAX Applications Searchable

机译:AJAX抓取:使AJAX应用程序可搜索

获取原文

摘要

Current search engines such as Google and Yahoo! are prevalent for searching the Web. Search on dynamic client-side Web pages is, however, either inexistent or far from perfect, and not addressed by existing work, for example on Deep Web. This is a real impediment since AJAX and Rich Internet Applications are already very common in the Web. AJAX applications are composed of states which can be seen by the user, but not by the search engine, and changed by the user using client-side events. Current search engines either ignore AJAX applications or produce false negatives. The reason is that crawling client-side code is a difficult problem that cannot be solved naively by invoking user events. The challenges are: lack of caching, duplicate states detection, very granular events, reducing the number of AJAX calls and infinite event invocation. This paper sets the stage for this new search challenge and proposes a solution: it shows how an AJAX Web application can be crawled in the granularity of the application states. A model of AJAX Web sites is presented. An AJAX Crawler and optimizations for caching and duplicate elimination are defined, and finally, the gain in search result quality and corresponding performance price are evaluated on YouTube, a real AJAX application.
机译:当前的搜索引擎,例如Google和Yahoo!在网络搜索中很普遍。但是,在动态客户端Web页面上的搜索是不存在的,或者是远远不够的,并且现有工作(例如在Deep Web上)无法解决。这是一个真正的障碍,因为AJAX和Rich Internet Applications在Web中已经非常普遍。 AJAX应用程序由状态组成,用户可以看到这些状态,但是搜索引擎无法看到它们,并且用户可以使用客户端事件对其进行更改。当前的搜索引擎要么忽略AJAX应用程序,要么产生假阴性。原因是,爬网客户端代码是一个很难解决的问题,无法通过调用用户事件来天真的解决。面临的挑战是:缺乏缓存,重复状态检测,非常精细的事件,减少AJAX调用次数和无限事件调用。本文为这一新的搜索挑战奠定了基础,并提出了一个解决方案:它展示了如何以应用程序状态的粒度对AJAX Web应用程序进行爬网。提出了一种AJAX网站模型。定义了AJAX搜寻器以及用于缓存和消除重复的优化,最后,在真实的AJAX应用程序YouTube上评估了搜索结果质量和相应的性能价格。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号