首页>
外国专利>
Multi-modal visual question answering system
Multi-modal visual question answering system
展开▼
机译:多模态视觉问题应答系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
The systems and methods described herein may generate multi-modal embeddings with sub-symbolic features and symbolic features. The sub-symbolic embeddings may be generated with computer vision processing. The symbolic features may include mathematical representations of image content, which are enriched with information from background knowledge sources. The system may aggregate the sub-symbolic and symbolic features using aggregation techniques such as concatenation, averaging, summing, and/or maxing. The multi-modal embeddings may be included in a multi-modal embedding model and trained via supervised learning. Once the multi-modal embeddings are trained, the system may generate inferences based on linear algebra operations involving the multi-modal embeddings that are relevant to an inference response to the natural language question and input image.
展开▼