首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Tensorize, Factorize and Regularize: Robust Visual Relationship Learning
【24h】

Tensorize, Factorize and Regularize: Robust Visual Relationship Learning

机译:张量化,分解和正则化:稳健的视觉关系学习

获取原文

摘要

Visual relationships provide higher-level information of objects and their relations in an image - this enables a semantic understanding of the scene and helps downstream applications. Given a set of localized objects in some training data, visual relationship detection seeks to detect the most likely 'relationship' between objects in a given image. While the specific objects may be well represented in training data, their relationships may still be infrequent. The empirical distribution obtained from seeing these relationships in a dataset does not model the underlying distribution well - a serious issue for most learning methods. In this work, we start from a simple multi-relational learning model, which in principle, offers a rich formalization for deriving a strong prior for learning visual relationships. While the inference problem for deriving the regularizer is challenging, our main technical contribution is to show how adapting recent results in numerical linear algebra lead to efficient algorithms for a factorization scheme that yields highly informative priors. The factorization provides sample size bounds for inference (under mild conditions) for the underlying [object, predicate, object] relationship learning task on its own and surprisingly outperforms (in some cases) existing methods even without utilizing visual features. Then, when integrated with an end-to-end architecture for visual relationship detection leveraging image data, we substantially improve the state-of-the-art.
机译:视觉关系在图像中提供对象及其关系的更高级信息-这可以实现场景的语义理解,并有助于下游应用程序。给定一些训练数据中的一组局部对象,视觉关系检测旨在检测给定图像中的对象之间最可能的“关系”。尽管可以在训练数据中很好地表示特定对象,但是它们之间的关系可能仍然很少。通过查看数据集中的这些关系获得的经验分布不能很好地建模基础分布-对于大多数学习方法而言,这是一个严重的问题。在这项工作中,我们从一个简单的多关系学习模型开始,该模型原则上提供了丰富的形式化形式,以得出学习视觉关系的强大先验。尽管推导正则化器的推理问题颇具挑战性,但我们的主要技术贡献是证明如何使数值线性代数中的最新结果适应如何导致分解算法的高效算法,从而产生高度信息量的先验信息。因式分解为潜在的[宾语,谓语,宾语]关系学习任务(在温和条件下)提供了推理的样本大小范围,即使没有利用视觉特征,其出众的表现也优于(在某些情况下)现有方法。然后,当与端到端体系结构集成在一起以利用图像数据进行视觉关系检测时,我们将极大地改善现有技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号