首页> 外国专利> Multi-modal visual question answering system

Multi-modal visual question answering system

机译：多模态视觉问题应答系统

页面导航

摘要
著录项
相似文献

摘要

The systems and methods described herein may generate multi-modal embeddings with sub-symbolic features and symbolic features. The sub-symbolic embeddings may be generated with computer vision processing. The symbolic features may include mathematical representations of image content, which are enriched with information from background knowledge sources. The system may aggregate the sub-symbolic and symbolic features using aggregation techniques such as concatenation, averaging, summing, and/or maxing. The multi-modal embeddings may be included in a multi-modal embedding model and trained via supervised learning. Once the multi-modal embeddings are trained, the system may generate inferences based on linear algebra operations involving the multi-modal embeddings that are relevant to an inference response to the natural language question and input image.

机译：这里描述的系统和方法可以生成具有子符号特征和符号特征的多模态嵌入。可以使用计算机视觉处理生成子符号嵌入物。符号特征可以包括图像内容的数学表示，其被丰富地与来自背景知识源的信息。系统可以使用诸如串联，平均，求和和/或最大化的聚合技术聚合所述子符号和符号特征。多模态嵌入物可以包括在多模态嵌入模型中并通过监督学习培训。一旦训练了多模式嵌入物，系统可以基于涉及与对自然语言问题和输入图像相关的推断响应相关的多模态嵌入的线性代数操作来生成推断。

著录项

公开/公告号US10949718B2

专利类型
公开/公告日2021-03-16

原文格式PDF
申请/专利权人 ACCENTURE GLOBAL SOLUTIONS LIMITED;
展开▼

申请/专利号US201916406380
发明设计人 LUCA COSTABELLO;NICHOLAS MCCARTHY;RORY MCGRATH;SUMIT PAI;
展开▼

申请日2019-05-08
分类号G06K9/72;G06F3/0484;G06K9/62;G06F40/30;
国家 US
入库时间 2024-06-14 21:22:18

相似文献

专利
外文文献
中文文献