首页> 外文会议>International World Wide Web Conference; Edinburgh(GB) >Browsing on Small Screens: Recasting Web-Page Segmentation into an Efficient Machine Learning Framework
【24h】

Browsing on Small Screens: Recasting Web-Page Segmentation into an Efficient Machine Learning Framework

机译:在小屏幕上浏览:将网页细分重铸为高效的机器学习框架

获取原文
获取原文并翻译 | 示例

摘要

Fitting enough information from webpages to make browsing on small screens compelling is a challenging task. One approach is to present the user with a thumbnail image of the full web page and allow the user to simply press a single key to zoom into a region (which may then be transcoded into wml/xhtml, summarized, etc). However, if regions for zooming are presented naively, this yields a frustrating experience because of the number of coherent regions, sentences, images, and words that may be inadvertently separated. Here, we cast the web page segmentation problem into a machine learning framework, where we re-examine this task through the lens of entropy reduction and decision tree learning. This yields an efficient and effective page segmentation algorithm. We demonstrate how simple techniques from computer vision can be used to fine-tune the results. The resulting segmentation keeps coherent regions together when tested on a broad set of complex webpages.
机译:从网页中获取足够的信息以使小屏幕上的浏览引人注目是一项艰巨的任务。一种方法是向用户呈现整个网页的缩略图,并允许用户简单地按一个键即可放大区域(然后可以将其转码为wml / xhtml,摘要等)。但是,如果天真地显示缩放区域,由于连贯区域,句子,图像和单词的数量可能会无意间分开,这会带来令人沮丧的体验。在这里,我们将网页分割问题投放到机器学习框架中,在该框架中,我们将通过熵减少和决策树学习的角度重新检查该任务。这产生了有效的页面分割算法。我们演示了如何使用计算机视觉中的简单技术来微调结果。在广泛的复杂网页上进行测试时,由此产生的分割将相关区域保持在一起。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号