首页> 外文期刊>Neurocomputing >Visual content-based web page categorization with deep transfer learning and metric learning
【24h】

Visual content-based web page categorization with deep transfer learning and metric learning

机译:基于视觉内容的网页分类,深度传输学习和度量学习

获取原文
获取原文并翻译 | 示例
           

摘要

The growing amounts of online multimedia content challenge the current search, recommendation and information retrieval systems. Information in the form of visual elements is highly valuable in a range of web mining tasks. However, the mining of these resources is a difficult task due to the complexity and variability of images, and the cost of collecting big enough datasets to successfully train accurate deep learning models. This paper proposes a novel framework for the categorization of web pages on the basis of their visual content. This is achieved by exploring the joint application of a transfer learning strategy and metric learning techniques to build a Deep Convolutional Neural Network (DCNN) for feature extraction, even when training data is scarce. The obtained experimental results evidence that the proposed approach outperforms the state-of-the-art handcrafted image descriptors and achieves a high categorization accuracy. In addition, we address the problem of over-time learning, so the proposed framework can learn to identify new web page categories as new labeled images are provided at test time. As a result, prior knowledge of the complete set of possible web categories is not necessary in the initial training phase. (C) 2019 Elsevier B.V. All rights reserved.
机译:越来越多的在线多媒体内容挑战当前的搜索,推荐和信息检索系统。视觉元素形式的信息在一系列网站挖掘任务中非常有价值。然而,由于图像的复杂性和变异性,这些资源的挖掘是一项艰巨的任务,以及收集足够大的数据集以成功培训准确的深度学习模型的成本。本文提出了一种基于视觉内容对网页分类的新框架。这是通过探索转移学习策略和度量学习技术的联合应用来实现的,为特征提取构建一个深度卷积神经网络(DCNN),即使在训练数据稀缺时也是如此。所获得的实验结果证明了所提出的方法优于最先进的手工制作图像描述符并实现了高分分类的准确性。此外,我们解决了过度学习的问题,因此所提出的框架可以学习识别新的网页类别,因为在测试时间提供新标记的图像。因此,在初始训练阶段中没有必要先知完整的可能网络类别的知识。 (c)2019 Elsevier B.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号