首页> 外文会议>ACM international conference on Multimedia >Extracting informative images from web news pages via imbalanced classification
【24h】

Extracting informative images from web news pages via imbalanced classification

机译:通过不平衡分类从Web新闻页面提取信息图像

获取原文

摘要

In this paper we propose an imbalanced classification algorithm to extract informative images from web news pages. Our algorithm resolve the difficult problem based on two approaches. First, we limit our dataset to a specific application area so that the patterns of the informative images can be captured by existing classification algorithms. Second, we propose an automatic negative samples filtering algorithm to eliminate most negative samples, so that the classification training data is rebalanced. Because most classification algorithms have reduced performance on imbalanced training data, our algorithm improves the overall performance significantly. In addition, our approach is inherently robust to new web sites and style/layout change of web sites.
机译:在本文中,我们提出了一种不平衡分类算法来从Web新闻页面提取信息图像。我们的算法基于两种方法解决了难题。首先,我们将数据集限制在特定的应用领域,以便可以通过现有的分类算法捕获信息图像的图案。其次,我们提出了一种自动的负样本过滤算法,以消除大多数负样本,从而使分类训练数据重新平衡。由于大多数分类算法在不平衡训练数据上的性能降低,因此我们的算法可显着提高整体性能。另外,我们的方法对于新网站和网站的样式/布局更改具有固有的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号