首页> 外文期刊>World Wide Web >Large-scale holistic approach to Web block classification: assembling the jigsaws of a Web page puzzle
【24h】

Large-scale holistic approach to Web block classification: assembling the jigsaws of a Web page puzzle

机译:Web块分类的大规模整体方法:组装网页拼图的拼图

获取原文
获取原文并翻译 | 示例
           

摘要

Web blocks are ubiquitous across the Web. Navigation menus, advertisements, headers, footers, and sidebars can be found almost on any website. Identifying these blocks can be of significant importance for tasks such as wrapper induction, assistance to visually impaired people, Web page topic clustering, and Web search among a few. There have been several approaches to the problem of Web block classification, but they focused on specific types of blocks, trying to classify all of them with one single set of features. In our approach each classifier has its own unique extendable set of features, with the features being extracted through the declarative-based BERyL language, and the classification itself is done through application of machine learning to these feature sets. In our approach we propose to take a holistic view of the page where all block classifiers in the classification system interact with each other, and accuracies of individual classifiers get improved through this interaction. The holistic approach to Web block classification is implemented through a system of constraints in our block classification system BERyL. The evaluation results with the holistic approach applied to the BERyL classification system achieve higher F-1 results than for individual non-connected classifiers, with the average F-1 of 98%. We also consider the distinction between classification of domain-independent and domain-dependent blocks and propose a large-scale solution to the problem of classification for both of these block types.
机译:Web块在网上普遍存在。可以在任何网站上找到导航菜单,广告,标题,页脚和侧边栏。识别这些块可能对包装纸归纳,援助辅助人员,网页主题聚类和网页搜索等任务进行了重要意义。 Web块分类问题有几种方法,但它们专注于特定类型的块,尝试使用一组特征对它们进行分类。在我们的方法中,每个分类器都有自己独特的可扩展功能,具有通过基于声明性的Beryl语言提取的功能,并且通过将机器学习应用于这些功能集来完成分类本身。在我们的方法中,我们建议拍摄分类系统中的所有块分类器的整体视图彼此交互,并且各个分类器的准确性通过这种交互得到改善。通过我们的块分类系统Beryl中的约束系统实现了Web块分类的整体方法。评价结果采用应用于Beryl分类系统的整体方法,比各个非连接的分类器实现更高的F-1结果,平均F-1为98%。我们还考虑域独立和域依赖块分类之间的区分,并提出了对这两个块类型的分类问题的大规模解决方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号