首页> 外文会议>32nd International Conference on Very Large Data Bases(VLDB 2006) vol.2 >Automatic Extraction of Dynamic Record Sections From Search Engine Result Pages
【24h】

Automatic Extraction of Dynamic Record Sections From Search Engine Result Pages

机译:从搜索引擎结果页面自动提取动态记录部分

获取原文
获取原文并翻译 | 示例

摘要

A search engine returned result page may contain search results that are organized into multiple dynamically generated sections in response to a user query. Furthermore, such a result page often also contains information irrelevant to the query, such as information related to the hosting site of the search engine. In this paper, we present a method to automatically generate wrappers for extracting search result records from all dynamic sections on result pages returned by search engines. This method has the following novel features: (1) it aims to explicitly identify all dynamic sections, including those that are not seen on sample result pages used to generate the wrapper, and (2) it addresses the issue of correctly differentiating sections and records. Experimental results indicate that this method is very promising. Automatic search result record extraction is critical for applications that need to interact with search engines such as automatic construction and maintenance of metasearch engines and deep Web crawling.
机译:搜索引擎返回的结果页面可能包含响应于用户查询而组织为多个动态生成的部分的搜索结果。此外,这样的结果页面通常还包含与查询无关的信息,例如与搜索引擎的托管站点有关的信息。在本文中,我们提出了一种自动生成包装器的方法,该包装器用于从搜索引擎返回的结果页上的所有动态部分中提取搜索结果记录。此方法具有以下新颖功能:(1)它旨在明确标识所有动态节,包括那些在用于生成包装程序的示例结果页上未看到的动态节,以及(2)解决正确区分节和记录的问题。实验结果表明该方法是很有前途的。自动搜索结果记录提取对于需要与搜索引擎进行交互的应用程序至关重要,例如元搜索引擎的自动构建和维护以及深层Web爬网。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号