首页> 外文会议>International Conference on Very Large Data Bases(VLDB 2004); 20040831-0903; Toronto(CA) >COMPASS: A Concept-based Web Search Engine for HTML, XML, and Deep Web Data
【24h】

COMPASS: A Concept-based Web Search Engine for HTML, XML, and Deep Web Data

机译:COMPASS:基于概念的Web搜索引擎,用于HTML,XML和深度Web数据

获取原文
获取原文并翻译 | 示例

摘要

Today's web search engines are still following the paradigm of keyword-based search. Although this is the best choice for large scale search engines in terms of throughput and scalability, it inherently limits the ability to accomplish more meaningful query tasks. XML query engines (e.g., based on XQuery or XPath), on the other hand, have powerful query capabilities; but at the same time their dedication to XML data with a global schema is their weakness, because most web information is still stored in diverse formats and does not conform to common schemas. Typical web formats include static HTML pages or pages that are generated dynamically from underlying database systems, accessible only through portal interfaces. We have developed an expressive style of concept-based and context-aware querying with relevance ranking that encompasses different, non-schematic data formats and integrates Web Services as well as Deep Web sources. Coined COMPASS (Context-Oriented Multi-Format Portal-Aware Search System), our system features this new language that combines the simplicity of web search engines with the expressiveness of (simple forms of) XML query languages.
机译:当今的网络搜索引擎仍沿袭基于关键字的搜索范例。尽管就吞吐量和可伸缩性而言,这是大型搜索引擎的最佳选择,但它固有地限制了完成更有意义的查询任务的能力。另一方面,XML查询引擎(例如,基于XQuery或XPath)具有强大的查询功能;但是同时,他们对具有全局模式的XML数据的奉献是他们的弱点,因为大多数Web信息仍然以多种格式存储并且不符合通用模式。典型的Web格式包括静态HTML页面或从基础数据库系统动态生成的页面,这些页面只能通过门户界面访问。我们已经开发了一种具有概念性的表达方式,它具有基于概念和上下文的查询,并且具有相关性排名,该排名涵盖了不同的非模式数据格式,并集成了Web服务和Deep Web源。我们的系统采用了组合式COMPASS(面向上下文的多格式门户网站感知搜索系统),它结合了Web搜索引擎的简单性和XML查询语言(简单形式)的表达性,结合了这一新语言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号