机译:阅读,观看,收听和汇总:异步文本,图像,音频和视频的多模式汇总
Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China|Univ Chinese Acad Sci, Beijing 100190, Peoples R China;
Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China|Univ Chinese Acad Sci, Beijing 100190, Peoples R China;
Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China|Univ Chinese Acad Sci, Beijing 100190, Peoples R China;
Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China|Univ Chinese Acad Sci, Beijing 100190, Peoples R China;
Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100864, Peoples R China|Chinese Acad Sci, CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing 100864, Peoples R China|Univ Chinese Acad Sci, Beijing 100049, Peoples R China;
Summarization; multimedia; multi-modal; cross-modal; natural language processing; computer vision;
机译:阅读,观看,倾听和总结:异步文本,图像,音频和视频的多模态摘要
机译:基于视听文本分析和对齐的音乐视频自动摘要
机译:在视频实时观看中,使用事件相关电位对镜头边界的视频摘要
机译:文本,图像,音频和视频的异步收集的多模式汇总
机译:采用分级 的多 模态 视频摘要 分析
机译:尝试用实验证据回答生物学问题:自动识别全文文章中包含图像内容的文本
机译:文本,图像,音频和视频的异步集合多模态摘要