【24h】

Considering Hyper Documents and Context for indexing the Web

机译:考虑用于建立Web索引的超级文档和上下文

获取原文
获取原文并翻译 | 示例

摘要

The growth of the Web, with hundreds of millions users and billions of pages, gives new challenges to the Information Retrieval (IR). Most of current systems are based on a re-use of traditional models, which have been developed for textual, atomic and independents documents, and are not adapted to the Web. A promising research orientation consists in studying the impact of Web structure on indexing and querying. Some approaches use Web structure for IR, but most of them consider a "bag-of-links", modelling the Web as a graph with HTML pages as nodes and hypertext links as edges without taking into account the links types. The HyperDocument model presented in this article is based on essential aspects of information description and comprehension: contents, composition, linear or non-linear reading and context. We present the main aspects of our Structured IR System for the Web.
机译:随着拥有数亿用户和数十亿页面的Web的发展,对信息检索(IR)提出了新的挑战。当前大多数系统基于对传统模型的重用,这些传统模型是为文本,原子和独立文档开发的,并且不适合Web。有希望的研究方向在于研究Web结构对索引和查询的影响。有些方法将Web结构用于IR,但大多数方法都将其视为“链接袋”,将Web建模为以HTML页面作为节点,将超文本链接作为边缘的图形,而不考虑链接类型。本文介绍的HyperDocument模型基于信息描述和理解的基本方面:内容,组成,线性或非线性阅读以及上下文。我们介绍了Web的结构化IR系统的主要方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号