...
首页> 外文期刊>Journal of the American Society for Information Science and Technology >Visual Webpage Block Importance Prediction Using Conditional Random Fields
【24h】

Visual Webpage Block Importance Prediction Using Conditional Random Fields

机译:使用条件随机字段的视觉网页块重要性预测

获取原文
获取原文并翻译 | 示例

摘要

We have developed a system that segments web pages into blocks and predicts those blocks' importance (block importance prediction or BIP). First, we use VIPS to partition a page into a tree composed of blocks and then extracts features from each block and labels all leaf nodes. This paper makes two main contributions. Firstly, we are pioneering the formulation of BIP as a sequence tagging task. We employ DFS, which outputs a single sequence for the whole tree in which related sub-blocks are adjacent. Our second contribution is using the conditional random fields (CRF) model for labeling these sequences. CRF's transition features model correlations between neighboring labels well, and CRF can simultaneously label all blocks in a sequence to find the global optimal solution for the whole sequence, not only the best solution for each block. In our experiments, our CRF-based system achieves an F1-measure of 97.41%, which significantly outperforms our ME-based baseline (95.64%). Lastly, we tested the CRF-based system using sites which were not covered in the training data. On completely novel sites CRF performed slightly worse than ME. However, when given only two training pages from a given site, CRF improved almost three times as much as ME.
机译:我们已经开发出了一个系统,可以将网页细分为多个块并预测这些块的重要性(块重要性预测或BIP)。首先,我们使用VIPS将页面划分为由块组成的树,然后从每个块中提取特征并标记所有叶节点。本文有两个主要贡献。首先,我们率先将BIP的制定作为序列标记任务。我们采用了DFS,它为相关子块相邻的整棵树输出单个序列。我们的第二个贡献是使用条件随机场(CRF)模型来标记这些序列。 CRF的过渡功能很好地模拟了相邻标签之间的相关性,并且CRF可以同时标记序列中的所有块,以找到整个序列的全局最优解,而不仅仅是每个块的最优解。在我们的实验中,基于CRF的系统实现了F1测度为97.41%,大大优于基于ME的基准(95.64%)。最后,我们使用培训数据未涵盖的站点测试了基于CRF的系统。在完全新颖的网站上,CRF的表现略差于ME。但是,当仅从给定站点获得两个培训页面时,CRF的改进几乎是ME的三倍。

著录项

  • 来源
  • 作者单位

    Department of Computer Science and Engineering, Yuan-Ze University, 135 Yuan-Tung Road, Chungli,Taoyuan, Taiwan;

    Department of Computer Science and Engineering, Yuan-Ze University, 135 Yuan-Tung Road, Chungli,Taoyuan, Taiwan;

    Department of Computer Science and Engineering, Yuan-Ze University, 135 Yuan-Tung Road, Chungli,Taoyuan, Taiwan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号