首页> 外文会议>International World Wide Web Conference >Browsing on Small Screens: Recasting Web-Page Segmentation into an Efficient Machine Learning Framework
【24h】

Browsing on Small Screens: Recasting Web-Page Segmentation into an Efficient Machine Learning Framework

机译:在小屏幕上浏览:重新使用Web页面分段到高效的机器学习框架中

获取原文

摘要

Fitting enough information from webpages to make browsing on small screens compelling is a challenging task. One approach is to present the user with a thumbnail image of the full web page and allow the user to simply press a single key to zoom into a region (which may then be transcoded into wml/xhtml, summarized, etc). However, if regions for zooming are presented naively, this yields a frustrating experience because of the number of coherent regions, sentences, images, and words that may be inadvertently separated. Here, we cast the web page segmentation problem into a machine learning framework, where we re-examine this task through the lens of entropy reduction and decision tree learning. This yields an efficient and effective page segmentation algorithm. We demonstrate how simple techniques from computer vision can be used to fine-tune the results. The resulting segmentation keeps coherent regions together when tested on a broad set of complex webpages.
机译:从网页拟合足够的信息以使小屏幕浏览引人注目是一个具有挑战性的任务。一种方法是向用户呈现完整网页的缩略图图像,并允许用户简单地按下单个键缩小到区域(然后可以将其转换为WML / XHTML,总结等)。然而,如果天然呈现缩放区域,则由于可能无意中分离的相干区域,句子,图像和单词的数量,这产生了令人沮丧的经验。在这里,我们将网页分段问题投入到机器学习框架中,在那里我们通过熵减少和决策树学习的镜头重新检查此任务。这产生了一种有效且有效的页面分割算法。我们展示了计算机视觉的简单技术如何用于微调结果。当在广泛的复杂网页上测试时,所得到的分割保持相干区域。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号