...
首页> 外文期刊>Journal of software >Segmenting Webpage with Gomory-Hu Tree Based Clustering
【24h】

Segmenting Webpage with Gomory-Hu Tree Based Clustering

机译:基于Gomory-Hu树的聚类分割网页

获取原文

摘要

We propose a novel web page segmentation algorithm based on finding the Gomory-Hu tree in a planar graph. The algorithm firstly distills vision and structure information from a web page to construct a weighted undirected graph, whose vertices are the leaf nodes of the DOM tree and the edges represent the visible position relationship between vertices. Then it partitions the graph with the Gomory-Hu tree based clustering algorithm. Experimental results show that, compared with VIPS and Chakrabarti et al.’s graph theoretic algorithm, our algorithm improves upon the other two with much higher precision and recall, and its running time is far lower than that of Chakrabarti et al.’s graph theoretic algorithm.
机译:我们提出了一种基于在平面图中找到Gomory-Hu树的新颖的网页分割算法。该算法首先从网页中提取视觉和结构信息,以构造一个加权的无向图,该图的顶点是DOM树的叶节点,边缘表示顶点之间的可见位置关系。然后使用基于Gomory-Hu树的聚类算法对图进行分区。实验结果表明,与VIPS和Chakrabarti等人的图论算法相比,我们的算法对其他两种算法进行了改进,具有更高的精度和查全率,并且其运行时间远低于Chakrabarti等人的图。理论算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号