首页> 外文会议>Distributed Computing Systems (ICDCS), 2012 IEEE 32nd International Conference on >Dash: A Novel Search Engine for Database-Generated Dynamic Web Pages
【24h】

Dash: A Novel Search Engine for Database-Generated Dynamic Web Pages

机译:Dash:一种用于数据库生成的动态网页的新颖搜索引擎

获取原文
获取原文并翻译 | 示例

摘要

Database-generated dynamic web pages (db-pages, in short), whose contents are created on the fly by web applications and databases, are now prominent in the web. However, many of them cannot be searched by existing search engines. Accordingly, we develop a novel search engine named Dash, which stands for Db-pAge Search, to support db-page search. Dash determines db-pages possibly generated by a target web application and its database through exploring the application code and the related database content and supports keyword search on those db-pages. In this paper, we present its system design and focus on the efficiency issue. To minimize costs incurred for collecting, maintaining, indexing and searching a massive number of db-pages that possibly have overlapped contents, Dash derives and indexes db-page fragments in place of db-pages. Each db-page fragment carries a disjointed part of a db-page. To efficiently compute and index db-page fragments from huge datasets, Dash is equipped with MapReduce based algorithms for database crawling and db-page fragment indexing. Besides, Dash has a top-k search algorithm that can efficiently assemble db-page fragments into db-pages relevant to search keywords and return the k most relevant ones. The performance of Dash is evaluated via extensive experimentation.
机译:数据库生成的动态网页(简称db-pages)的内容是由Web应用程序和数据库动态创建的,现在在网络中占有重要地位。但是,其中许多都无法被现有搜索引擎搜索。因此,我们开发了一种名为Dash的新颖搜索引擎,它代表Db-pAge Search,以支持db-page搜索。 Dash通过浏览应用程序代码和相关的数据库内容来确定目标Web应用程序及其数据库可能生成的数据库页面,并支持在这些数据库页面上进行关键字搜索。在本文中,我们介绍其系统设计,并着重于效率问题。为了最大程度地减少收集,维护,索引和搜索大量可能具有重叠内容的db-page所引起的成本,Dash派生并为db-page片段而不是db-page进行索引。每个数据库页片段都包含数据库页的不连续部分。为了从庞大的数据集中有效地计算和索引db-page片段,Dash配备了基于MapReduce的算法,用于数据库爬网和db-page片段索引。此外,Dash具有top-k搜索算法,可以将db-page片段有效地组合为与搜索关键字相关的db-page,并返回k个最相关的db-page。 Dash的性能通过广泛的实验进行评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号