首页> 外文会议>International World Wide Web Conference >Learning Block Importance Models for Web Pages
【24h】

Learning Block Importance Models for Web Pages

机译:网页的学习块重要性模型

获取原文
获取外文期刊封面目录资料

摘要

Previous work shows that a web page can be partitioned into multiple segments or blocks, and often the importance of those blocks in a page is not equivalent. Also, it has been proven that differentiating noisy or unimportant blocks from pages can facilitate web mining, search and accessibility. However, no uniform approach and model has been presented to measure the importance of different segments in web pages. Through a user study, we found that people do have a consistent view about the importance of blocks in web pages. In this paper, we investigate how to find a model to automatically assign importance values to blocks in a web page. We define the block importance estimation as a learning problem. First, we use a vision-based page segmentation algorithm to partition a web page into semantic blocks with a hierarchical structure. Then spatial features (such as position and size) and content features (such as the number of images and links) are extracted to construct a feature vector for each block. Based on these features, learning algorithms are used to train a model to assign importance to different segments in the web page. In our experiments, the best model can achieve the performance with Micro-F1 79% and Micro-Accuracy 85.9%, which is quite close to a person's view.
机译:先前的工作表明,可以将网页划分为多个段或块,并且页面中这些块的重要性通常不相等。同样,已经证明,区分页面中的嘈杂或不重要的块可以促进Web挖掘,搜索和可访问性。但是,没有提出统一的方法和模型来衡量网页中不同段的重要性。通过用户研究,我们发现人们对于网页中的块的重要性确实有一致的看法。在本文中,我们研究了如何找到一个模型来自动将重要性值分配给网页中的块。我们将块重要性估计定义为学习问题。首先,我们使用基于视觉的页面分割算法将网页划分为具有分层结构的语义块。然后提取空间特征(例如位置和大小)和内容特征(例如图像和链接的数量)以构造每个块的特征向量。基于这些功能,学习算法用于训练模型以将重要性分配给网页中的不同部分。在我们的实验中,最佳模型可以通过Micro-F1 79%和Micro-Accuracy 85.9%达到性能,这与人们的看法非常接近。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号