首页> 外文期刊>Neurocomputing >Visual content-based web page categorization with deep transfer learning and metric learning
【24h】

Visual content-based web page categorization with deep transfer learning and metric learning

机译:基于视觉内容的网页分类,包括深度迁移学习和度量学习

获取原文
获取原文并翻译 | 示例
       

摘要

The growing amounts of online multimedia content challenge the current search, recommendation and information retrieval systems. Information in the form of visual elements is highly valuable in a range of web mining tasks. However, the mining of these resources is a difficult task due to the complexity and variability of images, and the cost of collecting big enough datasets to successfully train accurate deep learning models. This paper proposes a novel framework for the categorization of web pages on the basis of their visual content. This is achieved by exploring the joint application of a transfer learning strategy and metric learning techniques to build a Deep Convolutional Neural Network (DCNN) for feature extraction, even when training data is scarce. The obtained experimental results evidence that the proposed approach outperforms the state-of-the-art handcrafted image descriptors and achieves a high categorization accuracy. In addition, we address the problem of over-time learning, so the proposed framework can learn to identify new web page categories as new labeled images are provided at test time. As a result, prior knowledge of the complete set of possible web categories is not necessary in the initial training phase. (C) 2019 Elsevier B.V. All rights reserved.
机译:在线多媒体内容的数量不断增长,对当前的搜索,推荐和信息检索系统构成了挑战。可视元素形式的信息在一系列Web挖掘任务中非常有价值。但是,由于图像的复杂性和可变性以及收集足够大的数据集以成功地训练准确的深度学习模型的成本,挖掘这些资源是一项艰巨的任务。本文提出了一种基于视觉内容对网页进行分类的新颖框架。通过探索转移学习策略和度量学习技术的联合应用来建立深度卷积神经网络(DCNN)以进行特征提取,即使在训练数据稀缺的情况下,也可以实现这一目标。获得的实验结果证明,所提出的方法优于最新的手工图像描述符,并且具有很高的分类精度。另外,我们解决了超时学习的问题,因此,在测试时提供新的标签图像时,提出的框架可以学习识别新的网页类别。结果,在初始培训阶段就不需要对可能的网络类别的完整集合进行先验知识。 (C)2019 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号