...
首页> 外文期刊>Consumer Electronics, IEEE Transactions on >Repetition-based web page segmentation by detecting tag patterns for small-screen devices
【24h】

Repetition-based web page segmentation by detecting tag patterns for small-screen devices

机译:通过检测小屏幕设备的标记模式进行基于重复的网页细分

获取原文
获取原文并翻译 | 示例
           

摘要

Web page segmentation into logical blocks is an important preprocessing step for recognizing informative content blocks in a page that leads to efficient information extraction and convenient display on the devices with smallsized screens. Previous methods for Web page segmentation are not flexible in a dynamic Web environment because they largely relied on heuristic rules generated by exploiting structural tags and visual information inherent in a page. To resolve this problem, this paper proposes a new method of Web page segmentation by recognizing repetitive tag patterns called key patterns in the DOM tree structure of a page. We report on the Repetition-based Page Segmentation (REPS) algorithm, which detects key patterns in a page and generates virtual nodes to correctly segment nested blocks. A series of experiments performed for real Web sites showed that REPS greatly contributes to improving the correctness of Web page segmentation.
机译:将网页分割为逻辑块是重要的预处理步骤,用于识别页面中的信息内容块,从而可以有效地提取信息并在具有小屏幕的设备上方便显示。以前的网页细分方法在动态Web环境中并不灵活,因为它们很大程度上依赖于通过利用页面固有的结构标签和可视信息生成的启发式规则。为了解决这个问题,本文提出了一种通过识别页面的DOM树结构中称为关键字模式的重复标记模式来进行网页细分的新方法。我们报告了基于重复的页面分割(REPS)算法,该算法可检测页面中的关键模式并生成虚拟节点以正确地分割嵌套块。对真实网站进行的一系列实验表明,REPS大大有助于提高网页细分的正确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号