首页> 外国专利> Multi-modal visual question answering system

Multi-modal visual question answering system

机译:多模态视觉问题应答系统

摘要

The systems and methods described herein may generate multi-modal embeddings with sub-symbolic features and symbolic features. The sub-symbolic embeddings may be generated with computer vision processing. The symbolic features may include mathematical representations of image content, which are enriched with information from background knowledge sources. The system may aggregate the sub-symbolic and symbolic features using aggregation techniques such as concatenation, averaging, summing, and/or maxing. The multi-modal embeddings may be included in a multi-modal embedding model and trained via supervised learning. Once the multi-modal embeddings are trained, the system may generate inferences based on linear algebra operations involving the multi-modal embeddings that are relevant to an inference response to the natural language question and input image.
机译:这里描述的系统和方法可以生成具有子符号特征和符号特征的多模态嵌入。可以使用计算机视觉处理生成子符号嵌入物。符号特征可以包括图像内容的数学表示,其被丰富地与来自背景知识源的信息。系统可以使用诸如串联,平均,求和和/或最大化的聚合技术聚合所述子符号和符号特征。多模态嵌入物可以包括在多模态嵌入模型中并通过监督学习培训。一旦训练了多模式嵌入物,系统可以基于涉及与对自然语言问题和输入图像相关的推断响应相关的多模态嵌入的线性代数操作来生成推断。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号