首页> 外文期刊>IEEE Transactions on Pattern Analysis and Machine Intelligence >Text-Guided Neural Network Training for Image Recognition in Natural Scenes and Medicine
【24h】

Text-Guided Neural Network Training for Image Recognition in Natural Scenes and Medicine

机译:自然场景与医学中的图像识别文本导游神经网络培训

获取原文
获取原文并翻译 | 示例

摘要

Convolutional neural networks (CNNs) are widely recognized as the foundation for machine vision systems. The conventional rule of teaching CNNs to understand images requires training images with human annotated labels, without any additional instructions. In this article, we look into a new scope and explore the guidance from text for neural network training. We present two versions of attention mechanisms to facilitate interactions between visual and semantic information and encourage CNNs to effectively distill visual features by leveraging semantic features. In contrast to dedicated text-image joint embedding methods, our method realizes asynchronous training and inference behavior: a trained model can classify images, irrespective of the text availability. This characteristic substantially improves the model scalability to multiple (multimodal) vision tasks. We also apply the proposed method onto medical imaging, which learns from richer clinical knowledge and achieves attention-based interpretable decision-making. With comprehensive validation on two natural and two medical datasets, we demonstrate that our method can effectively make use of semantic knowledge to improve CNN performance. Our method performs substantial improvement on medical image datasets. Meanwhile, it achieves promising performance for multi-label image classification and caption-image retrieval as well as excellent performance for phrase-based and multi-object localization on public benchmarks.
机译:卷积神经网络(CNNS)被广泛认可为机器视觉系统的基础。用于理解图像的CNN的传统规则需要培训与人类注释标签的图像,而无需任何额外的指示。在本文中,我们研究了一个新的范围,并探讨了神经网络培训的文本的指导。我们展示了两个版本的注意机制,以促进视觉和语义信息之间的相互作用,并鼓励CNN通过利用语义特征来有效地蒸馏出视觉特征。与专用文本图像联合嵌入方法相比,我们的方法实现了异步训练和推断行为:培训的模型可以对图像进行分类,无论文本可用性如何。该特性大大提高了多种(多模式)视觉任务的模型可扩展性。我们还将提议的方法应用于医学成像,从而从更丰富的临床知识中学习并实现了基于关注的可解释决策。通过对两个自然和两个医疗数据集的全面验证,我们证明我们的方法可以有效利用语义知识来改善CNN性能。我们的方法对医学图像数据集进行了大量改进。同时,它实现了多标签图像分类和标题图像检索的有希望的性能,以及在公共基准测试中基于短语和多对象本地化的优异性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号