首页> 外文会议>Annual International ACM SIGIR Conference on Research and Development in Information Retrieval >A Web Page Topic Segmentation Algorithm Based on Visual Criteria and Content Layout
【24h】

A Web Page Topic Segmentation Algorithm Based on Visual Criteria and Content Layout

机译:一种基于视觉标准和内容布局的网页主题分割算法

获取原文
获取外文期刊封面目录资料

摘要

This paper presents experiments using an algorithm of web page topic segmentation that show significant precision improvement in the retrieval of documents issued from the Web track corpus of TREC 2001. Instead of processing the whole document, a web page is segmented into different semantic blocks according to visual criteria (such as horizontal lines, colors) and structural tags (such as headings

, paragraph

). We conclude that combining visual and content layout criteria gives the best results for increasing the precision: the ranking of the page is calculated for relevant segments of pages resulting from the segmentation algorithm.

机译:本文介绍了使用Web页面主题分段算法的实验,该算法显示了从TREC 2001的Web轨道语料库发出的文件检索的重大精度。而不是处理整个文档,根据此网页将网页分段为不同的语义块视觉标准(例如水平线,颜色)和结构标签(例如标题

,段落

)。我们得出结论,相结合的视觉和内容布局标准给出了增加精度的最佳效果:计算页面的排名,以针对分割算法产生的页面的相关段。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号